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PREFACE 


The real payoff for artificial intelligence (AI) is applications. It is applications that has thrust 
AI into prominence and commercialization in the 1980’s. This report presents overviews of key 
application areas: Expert Systems, Computer Vision, Natural Language Processing, Speech In- 
terfaces, and Problem Solving and Planning. The basic approaches to such systems, the state of 
the art, existing systems and future trends and expectations are covered. 

It is anticipated that this report will prove useful to engineering and research managers, poten- 
tial users and others who will be affected by the rapidly growing area of AI applications. 

This report is part of the NBS/NASA series of overviews on AI and Robotics. Due to the scope 
of AI, Volume I — Artificial Intelligence — is issued in three parts (this report being Part B): 

Part A: The Core Ingredients, NASA TM 85836, June 1983 

I. Artificial Intelligence — What It Is 

II. The Rise, Fall and Rebirth of AI 

III. Basic Elements of AI 

IV. Applications 

V. The Principal Participants 

VI. State-of-the-Art 

VII. Towards the Future 
Sources for Further Information 
Glossary 

Part B: Applications, NASA TM 85838, Sept. 1983 

I. Expert Systems 

II. Computer Vision 

III. Natural Language Processing 

IV. Speech Recognition and Speech Understanding 

V. Speech Synthesis 

VI. Problem-Solving and Planning 

Part C: Basic AI Topics, NASA TM 85839, Oct. 1983 

I. Artificial Intelligence and Automation 

II. Search-Oriented Automated Problem Solving and Planning 

III. Knowledge Representation 

IV. Computational Logic 
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FOREWORD 


The opening of the decade of the 80’s saw Artificial Intelligence (AI) transition from a primar- 
ily research topic to commercial applications. The full impact of this transition has yet to be felt. 

AI has been designated by the U.S. Defense Science Board as one of the top 10 major payoff 
areas for the military. It has been made the core ingredient of Japan’s Fifth Generation computer 
research project by which they seek to catapult Japan into the dominant information society in 
the 1990’s. Similar importance has been attached to AI in the U.S., Great Britain and France. 

This report summarizes the key AI application areas of Expert Systems, Computer Vision, 
Natural Language Processing, Speech Interfaces, and Problem Solving and Planning. More 
detailed information can be found in the following documents available from the National 
Technical Information Service (NTIS), Springfield, VA 22161. 

An Overview of Expert Systems, NBSIR 2505 
May 1982 (Revised October 1982) 

An Overview of Computer Vision, NBSIR 2582 
September 1982 

An Overview of Natural Language Processing 
NBSIR 83-2687, April 1983 
NASA TM 85635, April 1983 

Two emerging AI topics — Automatic Programming, and Machine Learning — are not treated 
separately in this report but are included under Expert Systems. 

This document is Part B of the three part report: 

An Overview of Artificial Intelligence and Robotics 
Volume I — Artificial Intelligence 

Part A — The Core Ingredients, NASA TM 85836, June 1983 
Part B — Applications, NASA TM 85838, Sept. 1983 
Part C — Basic AI Topics, NASA TM 85839, Oct. 1983 

The important AI application areas of robotics and automated manufacturing are treated in 
An Overview of Artificial Intelligence and Robotics 

Volume II — Robotics, NBSIR 82-2479, March 1982. 


xi 




I. EXPERT SYSTEMS 


A. Introduction 

Expert Systems is probably the “hottest” topic in Artificial Intelligence (AI) today. Prior to the 
last decade, in trying to find solutions to problems, AI researchers tended to rely on non- 
knowledge-guided search techniques or computational logic. These techniques were successfully 
used to solve elementary problems or very well structured problems such as games. However, real 
complex problems are prone to have the characteristics that their search space tends to expand ex- 
ponentially with the number of parameters involved. For such problems, these older techniques 
have generally proved to be inadequate and a new approach was needed. This new approach em- 
phasized knowledge rather than search and has led to the field of Knowledge Engineering and Ex- 
pert Systems. The resultant expert systems technology, limited to academic laboratories in the 
70’s, is now becoming cost-effective and is beginning to enter into commercial applications. 

B. What is an Expert System? 

Feigenbaum, a pioneer in expert systems, (1982, p.l) states: 

An “expert system” is an intelligent computer program that uses knowledge and inference procedures to solve 
problems that are difficult enough to require significant human expertise for their solution. The knowledge 
necessary to perform at such a level, plus the inference procedures used, can be thought of as a model of the 
expertise of the best practitioners of the field. 

The knowledge of an expert system consists of facts and heuristics. The “facts” constitute a body of information 
that is widely shared, publicly available, and generally agreed upon by experts in a field. The “heuristics” are 
mostly private, little-discussed rules of good judgement (rules of plausible reasoning, rules of good guessing) that 
characterize expert-level decision making in the field. The performance level of an expert system is primarily a 
function of the size and quality of the knowledge base that it possesses. 

It has become fashionable today to characterize any large, complex AI system that uses large 
bodies of domain knowledge as an expert system. Thus, nearly all AI applications to real-world 
problems can be considered in this category, though the designation “knowledge-based systems” 
is more appropriate. 

C. The Basic Structure of an Expert System 

An expert system consists of: 

(1) a knowledge base (or knowledge source) of domain facts and heuristics associated with the 
problem; 

(2) an inference procedure (or control structure) for utilizing the knowledge base in the solu- 
tion of the problem; 

(3) a working memory — “global data base” — for keeping track of the problem status, the in- 
put data for the particular problem, and the relevant history of what has thus far been 
done. 
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A human “domain expert” usually collaborates to help develop the knowledge base. Once the 
system has been developed, in addition to solving problems, it can also be used to help instruct 
others in developing their own expertise. 

It is desirable, though not yet common, to have a user-friendly natural language interface to 
facilitate the use of the system in all three modes: development, problem solving, instruction. In 
some sophisticated systems, an explanation module is also included, allowing the user to chal- 
lenge and examine the reasoning process underlying the system’s answers. Figure 1-1 is a diagram 
of an idealized expert system. When the domain knowledge is stored as production rules, the 
knowledge base is often referred to as the “rule base,” and the inference engine as the “rule 
interpreter.” 

An expert system differs from more conventional computer programs in several important 
respects. Duda (1981, p. 242) observes that, in an expert system “. . . there is a clear separation of 
general knowledge about the problem (the rules forming a knowledge base) from information 
about the current problem (the input data) and the methods for applying the general knowledge 
to the problem (the rule interpreter).” In a conventional computer program, knowledge pertinent 
to the problem and methods for utilizing this knowledge are all intermixed, so that it is difficult to 
change the program. In an expert system, “. . . the program itself is only an interpreter (or 


USER 



(KNOWLEDGE SOURCE) (SYSTEM STATUS) 


Figure 1-1. Basic Structure of an Expert System. 
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general reasoning mechanism) and (ideally) the system can be changed by simply adding or 
subtracting rules in the knowledge base.” 

D. The Knowledge Base 

The most popular approach to representing the domain knowledge (both facts and heuristics) 
needed for an expert system is by production rules (also referred to as ‘‘SITUATION-ACTION 
rules” or “IF-THEN rules”).* Thus, often a knowledge base is made up mostly of rules which are 
invoked by pattern matching with features of the task environment as they currently appear in the 
global data base. 

E. The Conlrol Structure 

In an expert system a problem-solving paradigm must be chosen to organize and control the 
steps taken to solve the problem. A common, but powerful approach involves the chaining of IF- 
THEN rules to form a line of reasoning. The rules are actuated by patterns (which, depending on 
the strategy, match either the IF or the THEN side of the rules) in the global data base. The ap- 
plication of the rule changes the system status and therefore the data base, enabling some rules 
and disabling others. The rule interpreter uses a control strategy for finding the enabled rules and 
for deciding which of the enabled rules to apply. The basic control strategies used may be top- 
down (goal driven), bottom-up (data driven), or a combination of the two that uses a relaxation- 
like convergence process to join these opposite lines of reasoning together at some intermediate 
point to yield a problem solution. However, virtually all the heuristic search and problem solving 
techniques that the AI community has devised have appeared in the various expert systems. 

F. Uses of Expert Systems 

The uses of expert systems are virtually limitless. They can be used to: diagnose, repair, 
monitor, analyse, interpret, consult, plan, design, instruct, explain, learn, and conceptualize. 

G. Architecture of Expert Systems 

One way to classify expert systems is by function (e.g. diagnosis, planning, etc). However, 
examination of existing expert systems indicates that there is little commonality in detailed system 
architecture that can be detected from this classification. A more fruitful approach appears to be 
to look at problem complexity and problem structure and deduce what data and control struc- 
tures might be appropriate to handle these factors. 

The Knowledge Engineering community has evolved a number of techniques (presented in the 
excellent tutorial by Stefik et al. (1982) and summarized in Gevarter (1982)) which can be utilized 
in devising suitable expert system architectures. 

The use of these techniques in four existing expert systems is illustrated in Table I- 1-1 thru 
1-1-4. Table 1-1-1 thru 1-1-4 outlines the basic approach taken by each of these expert systems and 

*Not all expert systems are rule-based. The network-based expert systems MACSYMA, 1NTERNIST/CADUCEUS, 
Digitalis Therapy Advisor, HARPY and PROSPECTOR are examples which are not. Buchanan and Duda (1982) state 
that the basic requirements in the choice of an expert system knowledge representation scheme are extendibility, 
simplicity and explicitness. Thus, rule-based systems are particularly attractive. 
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TABLE 1-1-1. Characteristics of Example Expert Systems . 


SYSTEM: 

INSTITUTION: 

AUTHORS: 

FUNCTION: 


DENDRAL 
Stanford University 
Feigenbaum & Lederberg 
Data Interpretation 


Key Elements of 


Purpose 

Approach 

Knowledge 

Base 

Global Data 
Base 

Control 

Structure 

Generate 

plausible 

1. Derive constraints from the data. 

Rules for deriving 
constraints on molec- 

Mass spectrogram data 

Forward chaining 

structural 
representations 
of organic mol- 
ecules from mass 
spectrogram 
data 

2. Generate candidate structures. 

3. Predict mass spectrographs for 
candidates. 

4. Compare with data. 

ular structure from 
experimental data 

Procedure for generat- 
ing candidate struc- 
tures to satisfy con- 
straints 

Constraints 
Candidate structures 

Plan, generate and 
test. 


Rules for predicting 
spectrographs from 
structures 




TABLE 1-1-2. Characteristics of Example Expert Systems. 


SYSTEM: 

INSTITUTION: 

AUTHORS: 

FUNCTION: 


AM 

Stanford University 
Lenat 

Concept Formation 


Key Elements of 


Purpose 

Approach 

Knowledge Global Data Control 

Base Base Structure 

Discovery of 

Start with elementary ideas in set 

Elementary ideas in Plausible candidate Plan, generate, and 

mathematical 

theory. 

finite set theory. concepts. test. 

concepts 




Search a space of possible conjectures 

Heuristics for generat- 


that can be generated from these 

ing new mathematical 


elementary ideas. 

concepts by modifying 



and combining elemen- 



tary ideas. 


Choose the most interesting conjectures 

Heuristics of “interest- 


and pursue that line of reasoning. 

ingness” for discarding 



bad ideas. 



TABLE 1-1-3. Characteristics of Example Expert Systems. 


SYSTEM: RI 

INSTITUTION: CMU 

AUTHORS: McDermott 

FUNCTION: Design 


Key Elements of 


Purpose Approach 

Knowledge Global Data Control 

Base Base Structure 

Configure VAX 
computer sys- 
tems (from a 
customer’s 
order of 
components). 

Break problem up into the following 
ordered subtasks: 

1 . Correct mistakes in order. 

2. Put components into CPU cabinets. 

3. Put boxes into unibus cabinets and 
put components in boxes. 

4. Put panels in unibus cabinets. 

5. Lay out system on floor. 

6. Do the cabling. 

i Solve each subtask and move on to the 
' next one in the fixed order. 

Properties of (roughly Customer order. “MATCH” 

500) VAX components. (data driven) 

Current task. (no backtracking) 

Rules for determining 

when to move to next Partial configuration 
subtask based on (System state), 

system state. 

Rules for carrying out 
subtasks (to extend 
partial configuration). 

(Approximately 1200 
rules total) 



TABLE 1-1-4 . Characteristics of Example Expert Systems . 


SYSTEM: 

INSTITUTION: 

AUTHORS: 

FUNCTION: 


MYCIN 

Stanford University 

Shortliffe 

Diagnosis 


Key Elements of 


Purpose 

Approach 

Knowledge 

Base 

Global Data 

Base 

Control 

Structure 

Diagnosis of 

Represent expert judgmental reasoning 

Rules linking patient 

Patient history and 

Backward chaining 

bacterial 

as condition-conclusion rules together 

data to infection 

diagnostic tests. 

thru the rules. 

infections and 
recommendations 
for antibiotic 
therapy. 

with the expert's “certainty” estimate 
for each rule. 

Chain backwards from hypothesized 
diagnoses to see if the evidence 
supports it. 

Exhaustively evaluate all hypotheses. 

Match treatments to all diagnoses which 
have high certainty values. 

hypotheses. 

Rules for combining 
certainty factors. 

Rules for treatment. 

Current hypothesis. 
Status. 

Conclusions reached 
thus, far, and rule 
numbers justifying 
them. 

Exhaustive search. 



shows how the approach translates into key elements of the Knowledge Base, Global Data Base 
and Control Structure. An indication of the basic control structures of the systems in Table 1-1-1 
thru 1-1-4, and some of the other well known expert systems, is given in Table 1-2. 

Table 1-2 represents expert system control structures in terms of the search direction, the con- 
trol techniques utilized, and the search space transformations employed. The approaches used in 
the various expert systems are different implementations of two basic ideas for overcoming the 
combinatorial explosion associated with search in real complex problems. These two ideas are: 

(1) Find ways to efficiently search a space, 

(2) Find ways to transform a large search space into smaller manageable chunks that can be 
searched efficiently. 

It will be observed from Table 1-2 that there is little architectural commonality based either on 
function or domain of expertise. Instead, expert system design may best be considered as an art 
form, like custom home architecture, in which the chosen design can be implemented from the 
collection of available AI techniques in heuristic search and problem solving. 

In addition to the techniques indicated in Table 1-2, also emerging are distributed knowledge 
and problem solving approaches exemplified by the MDX expert system (Chandrasekaran, 1983) 
and the object-oriented programming language, LOOPS (Stefik et al., 1983). 

H. Existing Expert Systems 

Table 1-3 is a list, classified by function and domain of use, of most of the existing major expert 
systems. It will be observed that there is a predominance of systems in the Medical and Chemistry 
domains following from the pioneering efforts at Stanford University. From the list, it is also ap- 
parent that Stanford University dominates in number of systems, followed by M.I.T., CMU, 
BBN and SRI, with several dozen scattered efforts elsewhere. 

The list indicates that thus far the major areas of expert systems development have been in 
diagnosis, data analysis and interpretation, planning, computer-aided instruction, analysis, and 
automatic programming. However, the list also indicates that a number of pioneering expert 
systems already exist in quite a number of other functional areas. In addition, a substantial effort 
is under way to build expert systems as tools for constructing expert systems. 

I. Constructing an Expert System 

Duda (1981, p. 262) states that to construct a successful expert system, the following prere- 
quisites must be met: 

• there must be at least one human expert acknowledged to perform the task well. 

• the primary source of the expert’s exceptional performance must be special knowledge, 
judgment, and experience. 

• the expert must be able to explain the speciai knowledge and experience and the methods 
used to apply them to particular problems. 

• the task must have a well-bounded domain of application. 

Using present techniques and programming tools, the effort required to develop an expert 
system appears to be converging towards five man-years, with most endeavors employing two to 
five people in the construction. 
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TABLE 1-2. Control Structures of Some Well Known Expert Systems. 



Control Structure 

Search 

Direction 

Control 

Search Space 
Transformations 

Forward 

Backward 

Forward and Backward 

Event Driven 

Exhaustive Search 

Generate and Test 

Guessing 

Relevant Backtracking 

Least Commitment 

Multilines of Reasoning 

Network Editor 

Beam Search 

Multiple Models 

Break into Sub-Problems 

Hierarchical Refinement 

Hierarchical Resolution 

Meta Rules 

System 

Function 

Domain 

MYCIN 

Diagnosis 

Medicine 


X 



X 













DENDRAL 

Data Interpr. 

Chemistry 

X 





X 












EL 

Analysis 

Elec. Circuits 

X 






X 

X 










GUIDON 

C.A.I. 

Medicine 




X 














KAS 

Knowl. Acquis. 

Geology 

X 










X 







META-DENDRAL 

Learning 

Chemistry 

X 





X 












AM 

Concept Formation 

Math 

X 





X 












VM 

Monitoring 

Medicine 




X 

X 













GA1 

Data Interpr. 

Chemistry 

X 





X 












R1 

Design 

Computers 

X 




X 









X 




ABSTRIPS 

Planning 

Robots 


X 













X 



NOAH 

Planning 

Robots 


X 







X 





X 




MOLGEN 

Design 

Genetics 



X 




X 

X 

X 





X 

X 


X 

SYN 

Design 

Elec. Circuits 

X 












X 





HEARSAY II 

Signal Interpr. 

Speech Unders. 



X 






X 

X 






X 


HARPY 

Signal Interpr. 

Speech Unders. 

X 











X 






CRYSALIS 

Data Interpr. 

Crystallography 




X 


X 











X 



TABLE 1-3. Existing Expert Systems by Function. 


Function 

Domain 

System* 

Institution 

Diagnosis 

Medicine 

PIP 

M.I.T. 


Medicine 

CASNET 

Rutgers U. 


Medicine 

INTERNIST/CADUCEUS 

U. of Pittsburgh 


Medicine 

MYCIN 

Stanford U. 


Medicine 

PUFF 

Stanford U. 


Medicine 

MDX 

Ohio State U. 


Computer Faults 

DART 

Stanford U./IBM 


Computer Faults 

IDT 

DEC 


Nuclear Reactor Accidents 

REACTOR 

E G & G Idaho Inc. 

Data Analysis 

Geology 

DIPMETER ADVISOR 

M.I.T./Schlumberger 

and Interpretation 

Chemistry 

DENDRAL 

Stanford U. 


Chemistry 

GA1 

Stanford U. 


Geology 

PROSPECTOR 

SRI 


Protein Crystallography 

CRYSALIS 

Stanford U. 


Determination of Causal Relationships in Medicine 

RX 

Stanford U. 


Determination of Causal Relationships in Medicine 

ABEL 

M.I.T. 


Oil Well Logs 

ELAS 

AMOCO 

Analysis 

Electrical Circuits 

EL 

M.I.T. 


Symbolic Mathematics 

MACSYMA 

M.I.T. 


Mechanics Problems 

MECHO 

Edinburgh 


Naval Task Force Threat Analysis 

TECH 

Rand/NOSC 


Earthquake Damage Assessment 
for Structures 

SPERIL 

Purdue U. 


Digital Circuits 

CRITTER 

Rutgers U. 

Design 

Computer System Configurations 

Rl/XCON 

C.M.U./DEC 


Circuit Synthesis 

SYN 

M.I.T. 


Chemical Synthesis 

SYNCHEM 

SUNY Stonybrook 


♦References to these systems can be found in Duda (1981), Stefik, et al. (1982), Buchanan (1981), Buchanan and Duda 
(1982), Barr and Feigenbaum (1982), IJCAI-81, and AAAI-82. 



TABLE 1-3. Existing Expert Systems by Function, (cont.) 


Function 

Domain 

System* 

Institution 

Planning 

Chemical Synthesis 

SECHS 

U. of Cal. Santa Cruz 


Robotics 

NOAH 

SRI 


Robotics 

ABSTRIPS 

SRI 


Planetary Flybys 

DEVISER 

JPL 


Errand Planning 

OP-PLANNER 

Rand 


Molecular Genetics 

MOLGEN 

Stanford U. 


Mission Planning 

KNOBS 

MITRE 


Job Shop Scheduling 

ISIS-II 

CMU 


Design of Molecular Genetics Experiments 

SPEX 

Stanford U. 


Medical Diagnosis 

HODGKINS 

M.I.T. 


Naval Aircraft Ops 

AIRPLAN 

CMU 


Tactical Targeting 

TATR 

RAND 

Learning from 

Chemistry 

METADENDRAL 

Stanford U. 

Experience 

Heuristics 

EUR1SKO 

Stanford U. 

Concept Formation 

Mathematics 

AM 

CMU 

Signal Interpretation 

Speech Understanding 

HEARSAY II 

CMU 


Speech Understanding 

HARPY 

CMU 


Machine Acoustics 

su/x 

Stanford U. 


Ocean Surveillance 

HASP 

System Controls Inc. 


Sensors On Board Naval Vessels 

STAMMER-2 

NOSC, San Diego/SDC 


Medicine — Left Ventrical Performance 

ALVEN 

U. of Toronto 


Military Situation Determination 

ANALYST 

MITRE 

Monitoring 

Patient Respiration 

VM 

Stanford U. 

Use Advisor 

Structural Analysis 
Computer Program 

SACON 

Stanford U. 

Computer Aided 

Electronic Troubleshooting 

SOPHIE 

B.B.N. 

Instruction 

Medical Diagnosis 

GUIDON 

Stanford U. 


Mathematics 

EXCHECK 

Stanford U. 


Steam Propulsion Plant Operation 

STEAMER 

BBN 


Diagnostic Skills 

BUGGY 

BBN 


Causes of Rainfall 

WHY 

BBN 


Coaching of a Game 

WEST 

BBN 


Coaching of a Game 

WUMPUS 

SCHOLAR 

M.I.T. 

BBN 



TABLE 1-3. Existing Expert Systems by Function, (cont.) 


Function 

Domain 

System* 

Institution 

Knowledge 

Medical Diagnosis 

TE1RESIAS 

Stanford U. 

Acquisition 

Medical Consultation 

EXPERT 

Rutgers 


Geology 

KAS 

SRI 

Expert System 


ROSIE 

Rand 

Construction 


AGE 

Stanford U. 



HEARSAY III 

USC/ISI 



EMYCIN 

Stanford U. 



OPS 5 

CMU 



RAINBOW 

IBM 


Medical Diagnosis 

KMS 

U. of MD 


Medical Consultation 

EXPERT 

Rutgers 


Electronic Systems Diagnosis 

ARBY 

Smart Sys. Tech. 


Medical Consultation Using Time-Oriented Data 

MECS-AI 

Tokyo U. 

Consultation/Intelligent 

Battlefield Weapons Assignments 

BATTLE 

NRL AI Lab 

Assistant 

Medicine 

Digitalis Therapy Advisor 

M.I.T. 


Radiology 

RAYDEX 

Rutgers U. 


Computer Sales 

XCEL 

CMU/DEC 


Medical Treatment 

ONCOCIN 

Stanford U. 


Nuclear Power Plants 

CSA Model- Based Nuclear 

GA Tech 



Power Plant Consultant 



Diagnostic Prompting in Medicine 

RECONSIDER 

U. of CA, S.F. 

Management 

Automated Factory 

IMS 

CMU 


Project Management 

CALLISTO 

DEC 

Automatic Programming 

Modelling of Oil Well Logs 

<FNIX 

Schlumberger-Doll Res 



CHI 

Kestrel Inst. 



PECOS 

Stanford U. 



LIBRA 

Stanford U. 



SAFE 

USC/ISI 



DEDALUS 

SRI 



Programmer’s Apprentice 

M.I.T. 


Image Understanding 


VISIONS 

ACRONYM 


U. of Mass. 
Stanford U. 



J. Summary of the State-of-the-Art 

Buchanan (1981, pp. 6-7) indicates that the current state of the art in expert systems is 
characterized by: 

• Narrow domain of expertise 

Because of the difficulty in building and maintaining a large knowledge base, the typical do- 
main of expertise is narrow. The principal exception is INTERNIST, for which the knowledge 
base covers 500 disease diagnoses. However, this broad coverage is achieved by using a relatively 
shallow set of relationships between diseases and associated symptoms. (INTERNIST is now be- 
ing replaced by CADUCEUS, which uses causal relationships to help diagnose simultaneous 
unrelated diseases.) 

• Limited knowledge representation languages for facts and relations 

• Relatively inflexible and stylized input-output languages 

• Stylized and limited explanations by the systems 

• Laborious construction 

At present, it requires a knowledge engineer to work with a human expert to laboriously extract 
and structure the information to build the knowledge base. However, once the basic system has 
been built, in a few cases it has been possible to write knowledge acquisition systems to help ex- 
tend the knowledge base by direct interaction with a human expert, without the aid of a 
knowledge engineer. 

• Single expert as a “ knowledge czar. ” 

We are currently limited in our ability to maintain consistency among overlapping items in the 
knowledge base. Therefore, though it is desirable for several experts to contribute, one expert 
must maintain control to insure the quality of the data base. 

• Fragile behavior 

In addition, most systems exhibit fragile behavior at the boundaries of their capabilities. Thus, 
even some of the best systems come up with wrong answers for problems just outside their do- 
main of coverage. Even within their domain, systems can be misled by complex or unusual cases, 
or for cases for which they do not yet have the needed knowledge or for which even the human ex- 
perts have difficulty. 

• Requires Knowledge Engineer to Operate 

Another limitation is that for most current systems only their builders or other knowledge 
engineers can successfully operate them - a friendly interface not having yet been constructed. 
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Nevertheless, Randy Davis (1982) observes that there have been notable successes. A 
methodology has been developed for explicating informal knowledge. Representing and using 
empirical associations, five systems have been routinely solving difficult problems — DENDRAL, 
MACSYMA, MOLGEN, R1 and PUFF — and are in regular use. The first three all have serious 
users who are only loosely coupled to the system designers. DENDRAL, which analyzes chemical 
instrument data to determine the underlying molecular structure, has been the most widely used 
program (see Lindsay et al., 1980). Rl, which is used to configure VAX computer systems, has 
been reported to be saving DEC twenty million dollars per year, and is now being followed up 
with XCON. In addition, as indicated in Table 1-3, dozens of systems have been constructed and 
are being experimented with. 

K. Future Trends 

Figure 1-2 lists some of the expert systems applications currently under development. 

It will be observed that there appear to be few domain or functional limitations in the ultimate 
use of expert systems. However, the nature of expert systems is changing. The limitations of rule- 
based systems are becoming apparent. Not all knowledge can be readily structured in the form of 
empirical associations. Empirical associations tend to hide causal relations (present only implicit- 
ly in such associations). Empirical associations are also inappropriate for highlighting structure 
and function. 

Thus, the newer expert systems are adding deep knowledge having to do with causality and 
structure. These systems will be less fragile, thereby holding the promise of yielding correct 
answers often enough to be considered for use in autonomous systems, not just as intelligent 
assistants. 

The other change is a trend towards an increasing number of non-rule based systems. These 
systems, utilizing semantic networks, frames and other knowledge representations, are often bet- 
ter suited for causal modeling and representing structure. They also tend to simplify the reasoning 
required by providing knowledge representations more appropriate for the specific problem 
domain. 


• Medical diagnosis and prescription 

• Medical knowledge automation 

• Chemical data interpretation 

• Chemical and biological synthesis 

• Mineral and oil exploration 

• Planning/scheduling 

• Signal interpretation 

• Signal fusion— situation interpretation 
from multiple sensors 

• Military threat assessment 

• Tactical targeting 

• Space defense 


• Air traffic control 

• Circuit diagnosis 

• VLSI design 

• Equipment fault diagnosis 

• Computer configuration selection 

• Speech understanding 

• Intelligent Computer-Aided Instruction 

• Automatic Programming 

• Intelligent knowledge base access and 
management 

• Tools for building expert systems 


Figure 1-2. Expert System Applications Now Under Development. 
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Figure 1-3 (based largely on Hayes-Roth IJCAI-81 Expert system tutorial and on Feigenbaum, 
1982) indicates some of the future opportunities for expert systems. Again no limitation is 
apparent. 

It thus appears that expert systems will eventually find use in most endeavors which require 
symbolic reasoning with detailed professional knowledge — which includes much of the world’s 
work. In the process, there will be exposure and refinement of the previously private knowledge 
in the various fields of applications. 

On a more near-term scale, in the next few years we can expect to see expert systems with 
thousands of rules. In addition to the increasing number of rule-based systems we can also expect 
to see an increasing number of non-rule based systems. Also anticipated are much improved ex- 


• Building and Construction 

Design, planning, scheduling, control 

• Equipment 

Design, monitoring, control, diagnosis, maintenance, repair, instruction. 

• Command and Control 

Intelligence analysis, planning, targeting, communication 

• Weapon Systems 

Target identification, adaptive control, electronic warfare 

• Professions 

(Medicine, law, accounting, management, real estate, financial, engineering) 

Consulting, instruction, analysis 

• Education 

Instruction, testing, diagnosis, concept formation and new knowledge development from 
experience. 

• Imagery 

Photo interpretation, mapping, geographic problem-solving. 

• Software 

Instruction, specification, design, production, verification, maintenance 

• Home Entertainment and Advice-giving 

Intelligent games, investment and finances, purchasing, shopping, intelligent information 
retrieval 

• Intelligent Agents 

To assist in the use of computer-based systems 

• Office Automation 

Intelligent systems 

• Process Control 

Factory and plant automation 

• Exploration 

Space, prospecting, etc. 

Figure 1-3. Future Opportunities for Expert Systems. 
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planation systems that can explain (make “transparent”) why an expert system did what it did 
and what things are of importance. 

By the late 80’s, we can expect to see intelligent, friendly and robust human interfaces and 
much better system building tools. 

Somewhere around the year 2000, we can expect to see the beginnings of systems which semi- 
autonomously develop knowledge bases from text. The result of these developments may very 
well herald a maturing information society where expert systems put experts at everyone’s 
disposal. In the process, production and information costs should greatly diminish, opening up 
major new opportunities for societal betterment. 
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II. COMPUTER VISION 


A. Introduction 

Computer Vision — visual perception employing computers — shares with “Expert Systems” 
the role of being one of the most popular topics in Artificial Intelligence today. The computer 
vision field is multifaceted, having many participants with diverse viewpoints, with many papers 
having been written. However, the field is still in the early stages of development — organizing 
principles have not yet fully crystalized, and the associated technology has not yet been complete- 
ly rationalized. However, commercial vision systems have already begun to be used in manufac- 
turing and robotic systems for inspection and guidance tasks, and other systems (at various stages 
of development) are beginning to be employed in military, cartographic and image interpretation 
applications. 

II. Definition 

Computer (computational or machine) vision can be defined as perception by a computer 
based on visual sensory input. Barrow and Tenenbaum (1981, p. 573) state: 

Vision is an information-processing task with well-defined input and output. The input consists of arrays of 
brightness values, representing projections of a three-dimensional scene recorded by a camera or comparable 
imaging device. Several input arrays may provide information in several spectral bands (color) or from multiple 
viewpoints (stereo or time sequence). The desired output is a concise description of the three-dimensional scene 
depicted in the image, the exact nature of which depends upon the goals and expectations of the observer. It 
generally involves a description of objects and their interrelationships, but may also include such information as 
the three-dimensional structures of surfaces, their physical characteristics (shape, texture, color, material), and 
the locations of shadows and light sources . . . 

C. Relation to Human Vision 

MIT’s Marr and Nishihara (1978, p. 42) take the view that “Artificial Intelligence is (or ought 
to be) the study of information processing problems that characteristically have their roots in 
some aspects of biological information processing.” They developed a computational theory of 
vision based on their study of human vision. Figure II- 1 represents the transition from the raw im- 
age through the primal sketch to the 2-1/2D sketch (exemplified by Figure II-2), which contains 
information on local surface orientations, boundaries, and depths. 

The primal sketch, reminiscent of an artist’s hurried drawing, is a primitive but rich description 
of the way the intensities change over the visual field. It can be represented by a set of short line 
segments separating regions of different brightnesses. A list of the properties of the lines 
segments, such as location, length, and orientation for each segment can be used to represent the 
primal sketch. 

The late Dr. Marr and his associates’ development of a human visual information processing 
theory (Marr, 1982) has had a substantial impact on computational vision. 

There are strong indications (see, e.g., Gevarter, 1977) that the interpretative planning areas of 
the human brain set up a context for processing the input data. (This viewpoint is captured by 
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PROCESSING 



SURFACE 



INTENSITY REPRESENTATIONS VISIBLE SURFACE REPRESENTATIONS 


The computations begin with representations of the intensities in an image— first the image itself, 
(e.g., the gray-level intensity array) and then the primal sketch, a representation of spatial variations 
in intensity. Next comes the operation of a set of modules, each employing certain aspects of the 
information contained in the image to derive information about local orientation, local depth, and 
the boundaries of surfaces. From this is constructed the so-called 2-1/2 dimensional sketch. Note 
that no "high-level" information is yet brought to bear: the computations proceed by utilizing only 
what is available in the image itself. 

After: Marr and Nishihara, 1978, p. 42. 

Figure II-l. A Framework for Early and Intermediate States in A Theory of Visual Information 

Processing. 
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A candidate for the so-called 2- Vi -dimensional sketch, which encompasses local determinations of 
the depth and orientation of surfaces in an image, as derived from processes that operate upon the 
primal sketch or some other representation of changes in gray-level intensity. The lengths of the 
needles represent the degree of tilt at various points in the surface; the orientations of the needles 
represent the directions of tilt. . . Dotted lines show contours of surface discontinuity. No explicit 
representation of depth appears in this figure. 

Source: Marr and Nishihara, 1978, p. 41. 

Figure II-2. An Example of a 2-1/ 2D Sketch. 


Minsky’s (1975) AI “frame” concept for knowledge representation.) The brain then uses visual 
and other cues from the environment to draw in past knowledge to generate an internal represen- 
tation and interpretation of the scene. This knowledge-based expectation-guided approach to 
vision is now appearing in advanced AI computer vision systems. 

D. Basis for a General Purpose Image Understanding System 

Barrow and Tenenbaum (1981, p. 573) observe that in going from a scene to an image (an array 
of brightness values) that the image encodes much information about the scene, but the informa- 
tion is confounded in the single brightness value at each point. In projecting onto the two- 
dimensional image, information about the three-dimensional structure of the scene is lost. In 
order to decode brightness values and recover a scene description, it is necessary to employ a 
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priori knowledge embodied in models of the scene domain, the illumination, and the imaging 
process. 

As indicated by Figure II-3, computer vision is an active process that uses these models to inter- 
pret the sensory data. To accommodate the diversity of appearance found in real imagery, a high- 
performance, general-purpose system must embody a great deal of knowledge in its models. 



Source: Barrow and Tenenbaum, 1981, p. 573. 


Figure II-3. Model-based Interpretation of Images. 
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E. Basic Paradigms for Computer Vision* 

In broad terms, an image understanding system starts with the array of pixel amplitudes that 
define the computer image, and using stored models (either specific or generic) determines the 
content of a scene. Typically, various symbolic features such as lines and areas are first deter- 
mined from the image. These are then compared with similar features associated with stored 
models to find a match, when specific objects are being sought. In more generic cases, it is 
necessary to determine various characteristics of the scene, and using generic models determine 
from geometric shapes and other factors (such as allowable relationships between objects) the 
nature of the scene content. 

A variety of paradigms have been proposed to accomplish these tasks in image understanding 
systems. These paradigms are based on a common set of broadly defined processing and 
manipulating elements: feature extraction, symbolic representation, and semantic interpretation. 
The paradigms differ primarily in how these elements (defined below) are organized and con- 
trolled, and the degree of artificial intelligence and knowledge employed. 

/. Hierarchical Bottom-up Approach 

Figure IMA is a block diagram of a hierarchical paradigm of an image understanding system 
that employs a bottom-up processing approach. The hierarchical bottom-up approach can be 
developed successfully for domains with simple scenes made up of only a limited number of 
previously known objects. 

2. Hierarchical Top-down Approach 

This approach (usually called hypothesize and test), shown in Figure II-4B, is goal directed, the 
interpretation stage being guided in its analysis by trial or test descriptions of a scene. An example 
would be using template matching — matched filtering — to search for a specific object or struc- 
ture within the scene. Matched filtering is normally performed at the pixel level by cross correla- 
tion of an object template with an observed image field. It is often computationally advan- 
tageous, because of the reduced dimensionality, to perform the interpretation at a higher level in 
the chain by correlating image features or symbols rather than pixels. 

3. Heterarchical Approach 

Hierarchical image understanding systems are normally designed for specific applications. 
They thus tend to lack adaptability. A large amount of processing is also usually required. Pratt 
(1978) (pp. 572-573) observes that often much of this processing is wasted in the generation of 
features and symbols not required for the analysis of a particular scene. A technique to avoid this 
problem is to establish a central monitor to observe the overall performance of the image 
understanding system and then issue commands to the various system elements to modify their 
operation to maximize system performance and efficiency. 

Figure II-4C is a block diagram of an image understanding system that achieves heterarchical 
operation by distributed feedback control. 

•This section is primarily based on Pratt, 1978, pp. 570-574. 
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REPRESENTATION 



INTERPRETATION 


IMAGE ‘ ~ I I L-21 

FEATURES 

A. HIERARCHICAL BOTTOM-UP APPROACH 


SYMBOLS 


DESCRIPTION 



FEATURES SYMBOLS 


B. HIERARCHICAL TOP-DOWN APPROACH 



FEATURE CONTROL SYMBOL CONTROL ( 

1 


FEATURE CONTROL 


C. HETERARCHICAL APPROACH 


D. BLACKBOARD APPROACH 



Source: Pratt, 1978, pp. 570-574. 

Figure II-4. Basic Image Understanding Paradigms. 
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4. Blackboard Approach 

Another image understanding system configuration called the blackboard model has been pro- 
posed by Reddy and Newell (1975). Figure II-4D is a simplified representation of this approach in 
which the various system elements communicate with each other via a common working data 
storage called the blackboard. Whenever any element performs a task, its output is put into the 
common data storage, which is independently accessible by all other elements. The individual 
elements can be designed to act autonomously to further the common system goal as required. 
The blackboard system is particularly attractive in cases where several hypotheses must be con- 
sidered simultaneously and their components need to be kept track of at various levels of 
representation. 

F. Levels of Representation 

A computer vision system, like human vision is, commonly considered to be naturally struc- 
tured as a succession of levels of representation. 

Tenenbaum, et al. (1979, pp. 254-255), sketch in Figure II-5, a way in which to view an 
organization of a general-purpose vision system. They divide the figure into two parts. The first is 



Figure II-5. Organization of a Visual System. 
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image oriented (iconic), domain independent, and based on the image data (data driven). The 
second part of the figure is symbolic, dependent on the domain and the particular goal of the 
vision process. 

The first portion takes the image, which consists of an intensity array of picture elements (“pix- 
els,” e.g., 1000 x 1000), and converts it into image features such as edges and regions. These are 
then converted into a set of parallel “intrinsic images,” one each for distance (range), surface 
orientation, reflectance,* etc. 

The second part of the system segments these into volumes and surfaces dependent on our 
knowledge of the domain and the goal of the computation. Using domain knowledge and the 
constraints associated with the relations among objects in this domain, objects are identified and 
the scene analyzed consistent with the system goal. 

G. Research in Model-Based Vision Systems 

Most research efforts in vision have been directed at exploring various aspects of vision, or 
toward generating particular processing modules for a step in the vision process rather than in 
devising general purpose vision systems. However, there are currently two major U.S. efforts in 
general purpose vision systems. The ACRONYM system at Stanford University under the leader- 
ship of T. Binford, and the VISIONS system at the University of Massachusetts at Amherst under 
A. Hanson and E. Riseman. 

The ACRONYM system, outlined in Table II- 1-1 , is designed to be a general purpose, model- 
based system that does its major reasoning at the level of volumes rather than images. The system 
basically takes a hierarchical top-down approach as in Figure II-4B. ACRONYM has four essen- 
tial parts: modeling, prediction, description and interpretation. The user provides ACRONYM 
with models of objects (modeled in terms of volume primitives called generalized cones) and their 
spatial relationships; as well as generic models and their subclass relationships. These are both 
stored in graph form. The program automatically predicts which image features to expect. 
Description is a bottom-up process that generates a model-independent description of the image. 
Interpretation relates this description to the prediction to produce a three-dimensional under- 
standing of the scene. 

The VISIONS system outlined in Table II-1-2, can be considered to be a working tool to test 
various image understanding modules and approaches. Rather than using specific models, its 
high level knowledge is in the form of framelike “schemas” which represent expectations and ex- 
pected relationships in particular scene situations. VISIONS is based on monocular images and 
does its reasoning at the ievel of images rather than volumes. 

Other research efforts in model-based vision systems are summarized in TABLES III in Appen- 
dix I of Gevarter (1982A). All the research computer vision systems are individually crafted by the 
developers — reflecting the developers’ backgrounds, interests and domain requirements. All, ex- 
cept ACRONYM (and to an extent, 3-D Mosaic, Kanade, 1981), use image (2-D) models and are 
viewpoint dependent. Models are mostly described by semantic networks though feature vectors 
are also utilized. The systems, capitalizing on their choice to limit their observations to only a few 

’Fraction of normal incident illumination reflected. 
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TABLE II-l-l. Model-Based Vision Systems. 


to 

Ol 


Developer: Brooks et al. (1979), Brooks (1981) 

System: ACRONYM 

Purpose: General Purpose Vision System 

Example Domains: Identifying Airplanes on a Runway in Aerial Images 

Simulation for Robot Systems and for Automated Grasping of Objects 


Approach 


Modeling 


Image Feature 

Extraction & Representation Search & Matching Remarks 


Hierarchical top down appoach. Represents object classes 

from which subclasses 

Reasons between different levels of and specific objects are 
representation based on a hierarchy represented by numeric 
of representations. constraints. 


Ribbons and curves obtained 
from an edge mapper. 

Surfaces obtained from a 
stereo mapper. 


High level modeler provides a high 
level language to manipulate models 
using symbolic names. 

Predictor and Planner Module is a 
rule-based system to generate an 
Observability Graph from the 
Object Graph (3-D object repre- 
sentation consisting of nodes and 
relational arcs). 

Makes predictions (which are view- 
point insensitive) in the form of 
symbolic constraint expressions 
with variables. 

Makes a projective transformation 
from models. 

Predicts appearances of models in 
images in terms of ribbons and 
ellipses. 


Models 3-D objects 1 
using volume 
primitives: generalized 
cones and ribbons. 

Spatial relations of 
volume elements within 
an object defined 
hierarchically. 

Can model both specific 
and generic volume 
elements and relations 
between them. 

Models are part/whole 
graphs. 

Volume primitives have 
local rather than viewer- 
centered primitives. 


Nodes of the Picture Graph 
(symbolic version of image) 
correspond to ribbons, 
surfaces and curves. 

Arcs and relations indicate 
spatial relations between 
nodes. 


Matcher does an inter- 
pretation matching by 
mapping the Observ- 
ability Graph into the 
Picture Graph. 

Matcher works in a 
coarse to fine order. 

Combines local matches 
of ribbons into 
clusters. 

Searches for maximal 
subgraph matches in 
the Observability 
graph. 

Performs major inter- 
pretation at the level 
of volumes rather 
than at the level of 
images. 


Aims to be a 
general vision 
system. 

Insensitive to 
viewpoint. 

A goal is to make 
use of total 
information for 
interpretation. 

Feature extraction 
(e.g., finding 
lines and regions) 
still weak. 

Interpretation is 
limited to scenes 
with few objects. 

Substantial 
progress has 
been achieved in 
past few years. 



TABLE II-l-L Model-Based Vision Systems (cont.) 

Image Feature 

Approach Modeling Extraction & Representation Search & Matching 

Incorporates translation and rotation 
into observable representations. 

Searches for instances of models in 
images. It employs geometric reason- 
ing in the form of a rule based 
problem-solving system. 

It interprets (matches) in 3-D by 
enforcing constraints of the 3-D 
model. 


Remarks 



TABLE II-1-2 . Model-Based Vision Systems . 


Developer: Hanson & Riseman (1978a, b) 

Systems: VISIONS 

Purpose: Interpreting static monocular scenes 

Can be considered to be a working tool to test various image understanding modules and 
approaches 

Example Domains: House scenes from ground level 

Road scenes from ground level 


to 

-o 


Approach 


Image Feature 

Modeling Extraction & Representation Search & Matching 


Uses hierarchical modular approach 
to representation and control. 

Tries to be as general as possible to 
allow both bottom-up and top- 
down solution hypotheses as well 
as various intermediate combina- 
tions 

Incorporates the flexibility to utilize 
various feature extraction modules 
and multiple knowledge sources as 
required 

Allows for the possibility of gener- 
ating and verifying hypotheses 
along many paths 


Hierarchical structure 

Scene schemas (like 
frames) are the highest 
representation 

Hierarchy is: 

—schemas 
— objects 
— volumes 
— surfaces 


Uses both edge finding and 
region growing to segment 
the image into a layered 
directed graph of regions, 
line segments and vertices 

Uses a hierarchical proc- 
essing cone (pyramid) to 
be able to handle image 
data at various levels of 
resolution 


Proposed representations Uses a relaxation approach 


of 3D surfaces and 
volumes include: 

— generalized cylinders 
— surface patches with 
cubic B-splines to 
represent boundary 
and blending functions 


to organize edges into 
boundaries, and pixel 
clusters into regions, 
using high-level system 
guidance (interpretation 
guided segmentation) 


Employs semantic 
networks 
—nodes represent 
primitive entities 
(objects, concepts 
situations, etc.) 

— Labeled arcs rep- 
resent relationships 
between them 


Generates and stores 
partial models in 
“contexts” (of the 
CONNIVER program- 
ming language) which 
provide a history of 
decisions to be used 
when backtracking is 
necessary 

Uses a multiple knowl- 
edge source heter- 
archical approach 
which generates 
partial models in the 
search space of 
models. Attempts, 
using top-down and 
bottom-up relaxation 
techniques, to con- 
verge on a most 
probable solution. 

Uses rules for focusing 
on an element of a 
task, expanding that 
element by generating 
new hypotheses and 
verifying new 
hypotheses. 


Remarks 

System (Parma, 
1980) did 
reasonably well 
in making a 
crude segmenta- 
tion of a house 
scene 

Viewpoint 

dependent 

Schema used 
depends on 
specific scene. 



objects, use predominantly the top-down interpretation of images approach, relying heavily on 
prediction. 


H. Industrial Vision Systems 

I. General Characteristics 

The prominent aspect of industrial vision systems, in distinction to more general vision 
systems, is that they operate in a relatively known and structured environment. In addition, the 
situation (such as placement of cameras and lighting) can be configured to simplify the computer 
vision problem. Usually, the number and nature of possible objects will tend to be restricted, and 
the visual system will be tailored to the function performed. Thus many of them are based on a 
pattern recognition, rather than an image understanding, approach. Industrial vision systems are 
characteristically used for such activities as inspection, manipulation and assembly. 

A popular organization for industrial computer vision is a two-stage hierarchy with a bottom- 
up control flow. The lower level segments the image into regions corresponding to object sur- 
faces. The higher level used this segmentation to identify objects from their surface descriptions. 

In practice, most successful systems incorporate aspects of both bottom-up and top-down con- 
trol. The bottom-up processing is used to extract prominent features of a part to determine its 
position. Then, top-down control is used to direct a search to determine if the part satisfies an 
inspection criterion. 

Industrial inspection and assembly operations are well suited to model-based analysis, because 
of the well-defined geometric descriptions associated with manufactured items. CAD/CAM 
technology allows the specification of objects using either volumetric or surface-based models. 
These geometrically based models are particularly appropriate to the hypothesis-verify approach, 
in which low-level image features are extracted and matched to an appropriate computer- 
generated 2-D representation. 

In addition to geometric models, objects may also be represented by graphs. In this case, 
recognition becomes a graph-matching process. 

More commonly at present, rather than using geometric models or graphs, industrial vision 
systems are taught by being presented sample parts to be recognized in each of their expected 
stable states. Aspects of the resulting images are typically stored as templates, and recognition 
becomes template matching. The objects can also be represented in terms of their characteristic 
features, such as area, number of holes, etc., and the resulting feature vector stored to be 
matched (via a search process) to the corresponding extracted feature vector of the image during 
system operation. 

To simplify industrial vision systems, the input is usually reduced to a binary (black and white) 
image, so that objects appear as silhouettes. Simplicity is important in industrial vision systems 
because the computation time is limited, as most systems are expected to operate in near real time. 

2. Examples of Efforts in Industrial Visual Inspection Systems 

Kruger and Thompson (1981) discuss some example efforts of vision systems designed for in- 
spection. The systems reviewed are primarily for the inspection of printed circuit boards and IC 
chips, with template matching being the predominant inspection approach. 
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Chin (1982) has recently published an extensive bibliography on automated visual inspection 
techniques and applications. 

3. Examples of Efforts in Industrial Visual Recognition and Location Systems 

Table II-2 (largely derived from Kruger and Thompson, 1981) lists some example efforts of vi- 
sion systems designed for industrial part recognition and location. All these systems use a bottom- 
up approach. It will be observed that (except for Vamos 1979, and Albus, et al., 1982) these 
systems utilize template or feature vector matching. Vamos does work from a 3D wire frame 
mode which utilizes computer graphics type techniques to transform a model projection into 
alignment with observed lines in the image. 

Albus’ Machine Vision Group in the NBS Industrial Systems Division is using simplified 3D 
surface models of machined parts to generate expectancy images from needed viewpoints. The 
group is seeking to achieve real-time, hierarchical, multi-sensory, interactive robot guidance. 

4. Commercially Available Industrial Vision Systems 

Gevarter (1982A) surveys many of the Industrial Vision Systems that are currently commercial- 
ly available. Most of the systems require special lighting. 

Many of the systems designed for verification and inspection use pattern recognition, rather 
than AI techniques. The systems tend to be bottom-up (see Figure IMA) because of the speed 
required to achieve real-time operations. Often unique edge and feature extraction algorithms are 
programmed in hardware or firmware. 

The more sophisticated systems tend to utilize variations and improvements on the SRI Vision 
Module described in Table II-2. 

A few systems make good use of structured light for 3D sensing. A number of efforts in visual 
guidance of arc welding also utilize this technique. 


I. Who Is Doing It 

Rosen feld, at the University of Maryland, issues a yearly bibliography, arranged by subject 
matter, related to the computer processing of pictorial information. The issue covering 1981 
(Rosenfeld, 1982) includes nearly 1000 references. 

The following is a list by category of the U.S. “principal players” in computer vision. 


I. Research Oriented 
Universities 

Funded Under DARPA IU Program 
CMU 
U of MD 
MIT 

U. of Mass. 

Stanford U 
U of Rochester 

use 

U of Rhode Island 


Other Active Universities 

U of Texas at Austin 

VPI 

Purdue 

U of PA 

U of IL 

Wayne State U 

JHU 

RPI 
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TABLE 11-2. Example Research Efforts in Industrial Visual Recognition and Location Systems. 


U> 

O 


Developer 

Purpose 

Sample Domains 

Agin (1980) 

SRI Vision Module 

Locate, identify and guide 
manipulation of industrial 
parts 

Engine Parts 


Approach 


Bottom-up approach 


Uses thresholding to convert to a binary image 

Each line is sequentially scanned and edge points (where pixels change 
from 1 to 0 or 0 to 1 recorded). Each resulting segment on a line 
is matched to the previous line to determine their overlapping 
relationships. Using these relationships, the program traces the 
appearance and disappearance of blobs (regions) as the image is 
processed from top to bottom. 

Using blob descriptors, the system can recognize parts regardless of 
their position or orientation. The descriptors are matched using 
either a binary decision tree or a normalized nearest-neighbor 
method. 

The system is trained by repeatedly showing the object to the TV camera 
resulting in all potentially useful shape descriptions being automatically 
calculated and stored 


Modeling and 
Representations 

Blob descriptors include: 

— max. and min. 
x and y values 

—Holes 


— Moments of inertia 

— Perimeter length 

—Linked list of 
coordinates on 
the perimeter 


Holland Rossol & Ward (1979) 
Consight I 

Industrial part location, 
recognition and manipulation 

Engine parts 


Two linear light sources superimpose a line of light on a conveyor belt 
perpendicular to its direction of motion. The two lines separate, 
proportional to the part passing by. Point of separation determines 
part boundary; degree of separation determines part thickness. 

The scene is imaged with a linear array camera and a silhouette 
automatically generated. 


Feature vector of part 
image characteristics 


Uses same feature vector approach as SRI Module. 


TABLE 11-2. Example Research Efforts in Industrial Visual Recognition and Location Systems, (cont.) 


Developer 


Purpose 

Sample Domains 

Approach 

Modeling and 
Representations 

NBS: Albus et al. (1982) 

Visual servoing for robot 
guidance (real-time 
location and identification 
for manipulation) 

Employs a point light source, a sheets-of-structured-light generator 
and a camera, all mounted on the wrist of a robot arm. 

Uses alternate frames of: 

1. A regular point source illumination of the entire object, and 

Uses quadratic 
approximations to 
surfaces of idealized 
3-D objects. 

Machined parts 

2. Two parallel planes of structured light. 


National Bureau of Standards 

System determines location and orientation based on triangulation 
associated with relative height of intersection of light sheets with 
part, and recognition based on shape and size of observed lines that 
the planes of light make as they intersect part. Uses this information 
to interpret outline seen in image produced by the point source 
illumination. 



Analysis of vision input is performed with a hierarchically organized 
group of microprocessors. At each level of the hierarchy, an analytic 
process is guided by an expectancy-generating modeling process. The 
modeling process is in turn driven by a store of a priori knowledge, by 
knowledge of the robot’s movements, and by feedback from the analytic 
process. Each such level of the hierarchy provides output to guide a 
corresponding level of the robot’s hierarchical control system. 



TABLE II-2 . Example Research Efforts in Industrial Visual Recognition and Location Systems, (cont.) 


Developer 

Purpose 

Sample Domains 
Perkins (1978) 

Industrial parts recognition 
Engine components 
GM 


Approach 


Modeling and 
Representations 


Concurve models of 
sample parts 


System matches observed concurves with model generated concurves using: 

1. A preset control structure to select the order in which 

combinations of model and scene concurves are to be matched. 

2. Starts by matching one model and one scene concurve 

3. The stored model is spatially transformed and rotated to fit 

associated scene concurves 

System interactively trained by generating concurves of sample parts 
Can identify parts partially occluded by other parts 


Operates on 32 gray levels 
Bottom-up scene segmentation approach 

1. Reduce 256x256 pixel image to an “edge gradient” image 

2. Link edges with similar gradient magnitudes to form chains 

3. Characterize chains as either straight lines or circular arcs. 

(This reduces 65,000 pixel image to about 50 concurves.) 



TABLE II-2. Example Research Efforts in Industrial Visual Recognition and Location Systems, (cont.) 


Developer 

Purpose 

Sample Domains 

Approach 

Modeling and 
Representations 

Yachida and Tsuji (1978) 

Uses a boundary detection and isolation of parts in a binary image 

Stable orientation 

Industrial parts recognition 

approach similar to SRI Vision Module 

models of parts 


Recognition system based on a structured step-by-step analysis with 

— part name 

Nonoccluded parts of a 

the previously stored models 

—orientation 

small gasoline engine 


— list of primitive 


Uses a series of special feature detectors 

features 

Osaka Univ. 


— polar coordinate 


—hole detector 

boundary 


— line finder 
—texture detector 
—small hole detector 

System training involves interactive man-machine examination of the 
identification task 

representation 

Vamos (1979) 

Recognition of 3D objects 
Bearing housings 

Finds edges using a simplified version of the Hueckel-operator using 
only two linear templates 

Lines are then fitted to edges 

Wire-frame model transformed (and hidden line elimination used) to 

3D Wire Frame Models 

Assembly 

correspond to image — yielding recognition and part orientation 


Sheet metal parts to be 

Objects are interactively taught to system either by building a 


painted 

Neural nets in microscopic- 
section in neural research 

Hungarian Acad, of Science 

geometric model or by a computer-aided transformation of viewed 
samples 




Non-Profits 

SRI International, AI Center 

JPL 

ERIM 

U.S. Government 

NBS, Industrial Systems Div., Gaithersburg, MD 
NOSC (Naval Ocean Systems Center), San Diego 
NIH (National Institutes of Health) 

2. Commercial Vision Systems Developers 

Hundreds of companies are now involved in vision systems, a partial listing being given in 
Gevarter (1982A). 

J. Summary of the State-of-the-Art 

1. Human Vision 

Human vision is the only available example of a general purpose vision system. However, thus 
far not many AI researchers have taken an interest in the computations performed by natural 
visual systems, but this situation is changing. 

The MIT vision group (among others) believes that, to a first approximation, the human visual 
system is subdivided into modules specializing in visual tasks. There is also evidence that people 
do global processing first and use it to constrain local processing. 

Considerable information now exists about lower level visual processing in humans. However, 
as we progress up the human visual computing hierarchy, the exact nature of the appropriate 
representations becomes subject to dispute. Thus, overall human visual perception is still very far 
from being understood. 

2. Low and Intermediate Levels of Processing 

Though methods for powerful high-level understanding visual analysis are still in the process of 
being determined, insights into low-level vision are emerging. The basic physics of imaging, and 
the nature of constraints in vision and their use in computation is fairly well understood. Detailed 
programs for vision modules, such as “shape from shading” and “optical flow,” have begun to 
appear. Also, the representational issues are now better understood. 

However, even for well understood low-level operations such as edge detection, (see, e.g., 
Ballard, 1982) there has been no convergence among the many techniques proposed, and no 
method stands out as the best. In general, edge detectors are still unreliable, though Marr and 
Hilbert’s approach, based on the zero crossing of the second derivative of the intensity gradient, 
appears promising. 

In industrial vision, the primary technique for achieving robust edge finding and segmentation 
is to use special lighting and convert to a silhouette binary image in which edges and regions are 
readily distinguishable. 
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At intermediate levels, edge classification and labelling have been very successfully used in the 
blocks world. 

Binford (1982) in reviewing existing research in model-based vision systems observed that most 
systems first segment regions then describe their shape. None of the systems makes effective use 
of texture for segmentation and description. In general, shape description is primitive and inter- 
pretation systems have not yet made full use of even these limited capabilities. 

As yet, the extraction of useful information from color is extremely rudimentary. The percep- 
tual use of motion (optical flow) has been a focus of attention recently, but findings are 
preliminary. 

For low level processing, many recent algorithms take the form of parallel computations in- 
volving local interactions. One popular approach having this character is “relaxation,” in which 
local computations are iteratively propagated to try to extract global features. These locally 
parallel architectures are well suited to rapid parallel processing techniques using special purpose 
VLSI chips. 

3. Industrial Vision Systems 

Barrow and Tenenbaum (1981, p. 572) observe that: 

Significant progress has been made in recent years on practical applications of machine vision. Systems have been 

developed that achieve useful levels of performance on complex real imagery in tasks such as inspection of in- 
dustrial parts, interpretation of aerial imagery, and analysis of chest x-rays. Virtually all such systems are special 

purpose, being heavily dependent on domain-specific constraints and techniques. 

It has been estimated that as of mid- 1982, though less than 50 sophisticated industrial vision 
systems were actually in use in the U.S., approximately 1000 simple line-scan inspection systems 
were in regular operation. Though special purpose systems have thus far been the most effective, 
successful vision applications are now becoming commonplace and are expanding. Vision 
manufacturers are now beginning to provide easier user programming, friendlier user interfaces, 
and systems engineering support to prospective users. Many firms are now entering the industrial 
vision field, with technical leap-frogging being common due to rapidly changing technology. 

4. General Purpose Vision Systems 

Though many practical image recognition systems have been developed, Hiatt (1981, pp. 2, 8) 
observes that, “In current vision applications, the type of scene to be processed and acted upon is 
usually carefully defined and limited to the capability of the machine . . . General purpose com- 
puter vision has not yet been solved in practice.” This domain specificity makes each new applica- 
tion expensive and time consuming to develop. 

Binford (1982) in reviewing current model-based research vision systems concludes that most 
systems have not attempted to be general vision systems, though ACRONYM does demonstrate 
some progress toward this goal. Existing vision systems performances are strongly limited by the 
performance of their segmentation modules, their weak use of world knowledge and weak 
descriptions, making little use of shape. 

With the exception of ACRONYM (and to an extent 3-D Mosaic), the systems surveyed depend 
on image models and relations, and therefore are strongly viewpoint-dependent. To generalize to 
viewpoint-insensitive interpretations would require three-dimensional modeling and interpreta- 
tion as in ACRONYM. 
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Binford concludes that though the results of these and other efforts are encouraging as first 
demonstrations, nevertheless as general vision systems, they have a long way to go. 

K. Applications and Future Trends 

Brady (1981, p. 2) states that, “There is currently a surge of interest in image understanding on 
the part of industry.” Examples of current computer vision applications are indicated in 
Figure II-6. 

As the field of computer vision unfolds, we expect to see the following future trends.* 

1. Techniques 

• Though most industrial vision systems have used binary representations, we can expect in- 
creased use of gray scales because of their potential for handling scenes with cluttered 
backgrounds and uncontrolled lighting. 

• Recent theoretical work on monocular shape interpretation from images (shape from 
shading, texture, etc.) make it appear promising that general mechanisms for generating 
spatial observations from images will be available within the next 2 to 5 years to support 
general vision systems. 

• Successful techniques (such as stereo and motion parallax) for deriving shape and/or motion 
from multiple images should also be available within 2 to 5 years. 

• The mathematics of Image Understanding will continue to become more sophisticated. 

• Enlargement will continue of the links now growing between Image Understanding and 
Theories of Human Vision. 

2. Hardware and Architecture 

• We are now seeing hardware and software emerging that enables real-time operation in sim- 
ple situations. Within the next 2 to 5 years we should see hardware and software that will 
enable similar real-time operation for robotics and other activities requiring recognition, and 
position and orientation information. 

• Fast raster-based pipeline preprocessing hardware to compute low-level features in local 
regions of an entire scene are now becoming available and should find general use in com- 
mercial vision systems in 2 to 4 years. 

• As at virtually all visual levels, processing seems inherently parallel, parallel processing is a 
wave of the future (but not the entire answer). 

• Relaxation and constraint analysis techniques are on the increase and will be increasingly 
reflected in future architectures. 

3. A I and General Vision Systems 

Computer vision will be a key factor in achieving many artificial intelligence applications. The 
goal is to move from special-purpose visual processing to general-purpose computer vision. Work 
to date in model-based systems has made a tentative beginning. But the long-run goal is to be able 


♦These trends have been largely derived from statements by Brady (1981 A, 1981B), Binford (1982), Kruger and Thomp- 
son (1981), Agin (1980), Arden (1980), Rosenfeld (1981), Hiatt (1981), and Barrow and Tenenbaum (1981). 
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AUTOMATION OF INDUSTRIAL PROCESSES 

Object acquisition by robot arms, for example, for sorting or packing items arriving on con- 
veyor belts. 

Automatic guidance of seam welders and cutting tools. 

VLSI-related processes, such as lead bonding, chip alignment and packaging. 

Monitoring, filtering, and thereby containing the flood of data from oil drill sites or from 
seismographs. 

Providing visual feedback for automatic assembly and repair. 

INSPECTION TASKS 

The inspection of printed circuit boards for spurs, shorts, and bad connections. 

Checking the results of casting processes for impurities and fractures. 

Screening medical images such as chromosome slides, cancer smears, x-ray and ultrasound 
images, tomography. 

Routine screening of plant samples. 

Inspection of alpha-numerics on labels and manufactured items. 

Checking packaging and contents in pharmaceutical and food industries. 

Inspection of glass items for cracks, bubbles, etc. 

REMOTE SENSING 

Cartography, the automatic generation of hill-shaded maps, and the registration of satellite 
images with terrain maps. 

Monitoring traffic along roads, docks, and at airfields. 

Management of land resources such as water, forestry, soil erosion, and crop growth. 
Detecting mineral ore deposits. 

MAKING COMPUTER POWER MORE ACCESSIBLE 

Management information systems that have a communication channel considerably wider than 
current systems that are addressed by typing or pointing. 

Document readers (for those who still use paper). 

Design aids for architects and mechanical engineers. 

MILITARY APPLICATIONS 

Tracking moving objects. 

Automatic navigation based on passive sensing. 

Target acquisition and range finding. 

AIDS FOR THE PARTIALLY SIGHTED 

Systems that read a document and speak what they read. 

Automatic “guide dog” navigation systems. 

Figure II-6. Examples of Applications of Computer Vision Now Underway. 
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to deal with unfamiliar or unexpected input.* Reasoning in terms of generic models and reason- 
ing by analogy are two approaches being pursued. However, it is anticipated that it will be a 
decade or more before substantial progress will be made. 

4. Modeling and Programming 

• Now emerging is 3D modeling, arising largely from CAD/CAM technology. 3D CAD/CAM 
data bases will be integrated with industrial vision systems to realistically generate synthe- 
sized images for matching with visual inputs. 

• Illumination models, shading and surface property models will be increasingly incorporated 
into visual systems. 

• Volumetric models which allow prediction and interpretation at the levels of volumes, rather 
than images, will see greater utilization. 

• High level vision programming languages (such as Automatix’s RAIL) that can be integrated 
with robot and industrial manufacturing languages are now beginning to appear and will 
become commonplace within 5 years. 

• Generic representations for amorphous objects (such as trees) have been experimentally 
utilized and should become generally available within 5 years. 

5. Knowledge Acquisition 

• Strategies for indexing into a large database of models should be available within the next 2 
to 5 years. 

• “Training by being told” will supplement “training by example” as computer graphics 
techniques and vision programming languages become more common. 

6. Sensing 

• An important area of development is 3D sensing. Several current industrial vision systems 
are already employing structured light for 3D sensing. A number of new innovative tech- 
niques in this area are expected to appear in the next 5 years. 

• More active vision sensors such as lidar are now being explored, but are unlikely to find 
substantial industrial application until the last half of this decade. 

7. Industrial Vision Systems 

• We will see increased use of advanced vision techniques in industrial vision systems, 
including gray scale imagery. 

• We are now observing a shortening time lag between research advances and their applica- 
tions in industry. It is anticipated that in the future this lag may be as little as one to two 
years. 

• Advanced electronics hardware at reduced cost is increasing the capabilities and speed of in- 
dustrial vision, while simultaneously reducing costs. 

♦As computer vision systems move toward this goal, they will increasingly incorporate Expert System components 
using multiple knowledge sources. Gevarter (1982B) provides An Overview of Expert Systems, in which ACRONYM 
and VISIONS are considered to be examples of Expert Systems. 
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• It is anticipated that special lighting and active sensing will play an increasing role in 
industrial vision. 

• Common programming languages and improved interface standards will within the next 3 to 
10 years enable easier integration of vision to robots and into the industrial environment. 

8. Future Applications 

• It is anticipated that about one quarter of all industrial robots will be equipped with some 
form of vision system by 1990. 

• It is likely that in the order of 90% of all industrial inspection activities requiring vision will 
be done with computer vision systems within the next decade. 

• New vision system applications in a wide variety of areas, as yet unexplored, will begin to ap- 
pear within this decade. An example of such a system might be visual traffic monitors at in- 
tersections that could perceive cars, pedestrians, etc., in motion, and control the flow of 
traffic accordingly. 

• Computer vision will play a large role in future military applications. The Defense Mapping 
Agency intends to achieve fully automated production for mapping, charting and geodesy 
by 1995, utilizing “expert system”-guided computer vision facilities. 

L. Conclusion 

In conclusion, the amount of activity and the many researchers in the computer vision field 
suggest that within the next 5 to 10 years, we should see some startling advances in practical com- 
puter vision, though the availability of practical general vision systems still remains a long way 
off. 
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III. NATURAL LANGUAGE PROCESSING (NLP)* 


A. Introduction 

One major goal of Artificial Intelligence (AI) research has been to develop the means to in- 
teract with machines in natural language (in contrast to a computer language). The interaction 
may be typed, printed or spoken. The complementary goal has been to understand how humans 
communicate. The scientific endeavor aimed at achieving these goals has been referred to as com- 
putational linguistics (or more broadly as cognitive science), an effort at the intersection of AI, 
linguistics, philosophy and psychology. 

Human communication in natural language is an activity of the whole intellect. AI researchers, 
in trying to formalize what is required to properly address natural language, find themselves in- 
volved in the long term endeavor of having to come to grips with this whole activity. (Formal lin- 
guists tend to restrict themselves to the structure of language.) The current AI approach is to con- 
ceptualize language as a knowledge-based system for processing communications and to create 
computer programs to model that process. 

Communication acts can serve many purposes, depending on the goals, intentions and strate- 
gies of the communicator. One goal of the communication is to change some aspect of the 
recipient’s mental state. Thus, communication endeavors to add or modify knowledge, change a 
mood, elicit a response or establish a new goal for the recipients. 

For a computer program to interpret a relatively unrestricted natural language communication, 
a great deal of knowledge is required. Knowledge is needed of: 

— the structure of sentences 

— the meaning of words 

— the morphology of words 

— a model of the beliefs of the sender 

— the rules of conversation, and 

— an extensive shared body of general information about the world. 

This body of knowledge can enable a computer (like a human) to use expectation-driven 
processing in which knowledge about the usual properties of known objects, concepts, and what 
typically happens in situations, can be used to understand incomplete or ungrammatical sentences 
in appropriate contexts. 

B. Applications 

There are many applications for computer-based natural language understanding systems. 
Some of these are listed in Table III-l. 


*A more complete treatment of NLP is given in Gevarter (1983). 
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TABLE III-l. Some Applications of Natural Language Processing. 


Discourse 

Speech Understanding 
Story Understanding 

Information Access 

Information Retrieval 
Question Answering Systems 
Computer-Aided Instruction 

Information Acquisition or Transformation 

Machine Translation 
Document or Text Understanding 
Automatic Paraphrasing 
Knowledge Compilation 
Knowledge Acquisition 


Interaction with Intelligent Programs 

Expert Systems Interfaces 
Decision Support Systems 
Explanation Modules for Computer Actions 
Interactive Interfaces to Computer Programs 

Interacting with Machines 

Control of Complex Machines 

Language Generation 

Document or Text Generation 
Speech Output 

Writing Aids: e.g., grammar checking 


C. Approach 

Natural Language Processing (NLP) systems utilize both linguistic knowledge and domain 
knowledge to interpret the input. As domain knowledge (knowledge about the subject area of 
communication) is so important to understanding, it is usual to classify the various systems based 
on their representation and utilization of domain knowledge. On this basis, Hendrix and 
Sacerdoti (1981) classify systems as Types A, B, or C,* with Type A being the simplest, least 
capable and correspondingly least costly systems. 

1. Type A: No World Models 

a. Key Words or Patterns 

The simplest systems utilize ad hoc data structures to store facts about a limited domain. Input 
sentences are scanned by the programs for predeclared key words, or patterns, that indicate 
known objects or relationships. 

b. Limited Logic Systems 

In limited logic systems, information in their data base was stored in some formal notation, and 
language mechanisms were utilized to translate the input into the internal form. The internal form 
chosen was such as to facilitate performing logical inferences on information in the data base. 

*Other system classifications are possible, e.g., those based on the range of syntactic coverage. 
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2. Type B: Systems That Use Explicit World Models 

In these systems, knowledge about the domain is explicitly encoded, usually in frame or net- 
work representations (discussed in a later section) that allow the system to understand input in 
terms of context and expectations. Cullingford’s work (see Schank and Ableson, 1977) on SAM 
(Script Applier Mechanism) is a good example of this approach. 

3. Type C: Systems that Include Information about the Goals and Beliefs of Intelligent Entities. 
These advanced systems (still in the research stage) attempt to include in their knowledge base 

information about the beliefs and intentions of the participants in the communication. If the goal 
of the communication is known, it is much easier to interpret the message. Schank and Abelson’s 
(1977) work on plans and themes reflects this approach. 

D. The Parsing Problem 

For more complex systems than those based on key words and pattern matching, language 
knowledge is required to interpret the sentences. The system usually begins by “parsing” the in- 
put (processing an input sentence to produce a more useful representation for further analysis). 
This representation is normally a structural description of the sentence indicating the relationship 
of the component aparts. To address the parsing problem and to interpret the result, the com- 
putational linguistic community has studied syntax, semantics, and pragmatics. Syntax is the 
study of the structure of phrases and sentences. Semantics is the study of meaning. Pragmatics is 
the study of the use of language in context. 

E. Grammars 

Barr and Feigenbaum (1981, p. 229) state, “A grammar of a language is a scheme for specify- 
ing the sentences allowed in the language, indicating the syntactic rules for combining words into 
well-formed phrases and clauses.” The following grammars are some of the most important.* 

1. Phrase Structure Grammar — Context Free Grammar 
Chomsky (see, e.g., Winograd, 1983) had a major impact on linguistic research by devising a 
mathematical approach to language. He defined a series of grammars based on rules for rewriting 
sentences into their component parts. He designated these as 0, 1, 2, or 3, based on the restric- 
tions associated with the rewrite rules, with 3 being the most restrictive. 

Type 2 — Context-Free (CF) or Phrase Structure Grammar (PSG) — has been one of the most 
useful in natural-language processing. It has the advantage that all sentence structure derivations 
can be represented as a tree and practical parsing algorithms exist. Though it is a relatively natural 
grammar, it is unable to capture all the sentence constructions found in most natural languages 
such as English. Gazder (1981) has recently broadened the applicability of CF PSG by adding 
augmentations to handle situations that do not fit the basic grammar. This generalized Phrase 
Structure Grammar is now being developed by Hewlett Packard (Gawron et al., 1982). 


‘Charniak and Wilks (1976) provide a good overview of the various approaches. 
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2. Transformational Grammar 

Tennant (1981, p. 89) observes that “The goal of a language analysis program is recognizing 
grammatical sentences and representing them in a canonical structure (the underlying structure).” 
A transformational grammar (Chomsky, 1957) consists of a dictionary, a phrase structure gram- 
mar and a set of transformations. In analyzing sentences, using a phrase structure grammar, first 
a parse tree is produced. This is called the surface structure. The transformational rules are then 
applied to the parse tree to transform it into a canonical form called the deep (or underlying) 
structure. As the same thing can be stated in several different ways, there may be many surface 
structures that translate into a single deep structure. 

3. Case Grammar 

Case Grammar is a form of Transformational Grammar in which the deep structure is based on 
cases - semantically relevant syntactic relationships. The central idea is that the deep structure of a 
simple sentence consists of a verb and one or more noun phrases associated with the verb in a par- 
ticular relationship. These semantically relevant relationships are called cases. Fillmore (1971) 
proposed the following cases: Agent, Experiencer, Instrument, Object, Source, Goal, Location, 
Type and Path. 

The cases for each verb form an ordered set referred to as a “case frame.” A case frame for the 
verb “open” would be: 

(object (instrument) (agent)) 

which indicates that open always has an object, but the instrument or agent can be omitted as in- 
dicated by their surrounding parentheses. Thus the case frame associated with the verb provides a 
template which aids in understanding a sentence. 

4. Semantic Grammars 

For practical systems in limited domains, it is often more useful, instead of using conventional 
syntactic constituents such as noun phrases, verb phrases and prepositions, to use meaningful 
semantic components instead. Thus, in place of nouns when dealing with a naval data base, one 
might use ships, captains, ports and cargos. This approach gives direct access to the semantics of 
a sentence and substantially simplifies and shortens the processing. Grammars based on this 
approach are referred to as semantic grammars (see, e.g., Burton, 1976). 

5. Other Grammars 

A variety of other, but less prominent, grammars have been devised. Still others can be ex- 
pected to be devised in the future. One example is Montague Grammar (Dowty et al., 1981) which 
uses a logical functional representation for the grammar and therefore is well suited for the 
parallel-processing logical approach now being pursued by the Japanese (see Nishida and 
Doshita, 1982) for their future AI work as embodied is their Fifth Generation Computer research 
project. 
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F. Semantics and the Cantankerous Aspects of Language 

Semantic processing (as it tries to interpret phrases and sentences) attaches meanings to the 
words. Unfortunately, English does not make this as simple as looking up the word in the dic- 
tionary, but provides many difficulties which require context and other knowledge to resolve. 
Examples are: 

1. Multiple Word Senses 

Syntactic analysis can resolve whether a word is used as a noun or a verb, but further analysis is 
required to select the sense (meaning) of the noun or verb that is actually used. For example, 
“fly” used as a noun may be a winged insect, a fancy fishhook, a baseball hit high in the air, or 
several other interpretations as well. The appropriate sense can be determined by context (e.g., 
for “fly” the appropriate domain of interest could be extermination, fishing or sports), or by 
matching each noun sense with the senses of other words in the sentence. This latter approach was 
taken by Reiger and Small (1979) using the (still embryonic) technique of “interacting word ex- 
perts,” and by Finin (1980) and McDonald (1982) as the basis for understanding noun 
compounds. 

2. Pronouns 

Pronouns allow a simplified reference to previously used (or implied) nouns, sets or events. 
Where feasible, using pragmatics, pronoun antecedents are usually identified by reference to the 
most recent noun phrase having the same context as the pronoun. 

3. Ellipsis and Substitution 

Ellipsis is the phenomenon of not stating explicitly some words in a sentence, but leaving it to 
the reader or listener to fill them in. Substitution is similar — using a dummy word in place of the 
omitted words. Employing pragmatics, ellipses and substitutions are usually resolved by matching 
the incomplete statement to the structures of previous recent sentences — finding the best partial 
match and then filling in the rest from this matching previous structure. 

G. Knowledge Representation* 

As the AI approach to natural language processing is heavily knowledge based, it is not surpris- 
ing that a variety of knowledge representation (KR) techniques have found their way into the 
field. Some of the more important ones are: 

1. Procedural Representations — The meanings of words or sentences being expressed as 
computer programs that reason about their meaning. 


♦More complete presentations on KR can be found in Chapter III of Barr and Feigenbaum (1981), and in Part C of this 
volume. 
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2. Declarative Representations 

a. Logic — Representation in First Order Predicate Logic, for example. 

b. Semantic Networks — Representations of concepts and relationships between concepts as 
graph structures consisting of nodes and labeled connecting arcs. 

3. Case Frames — (covered earlier) 

4. Conceptual Dependency — This approach (related to case frames) is an attempt to provide a 
representation of all actions in terms of a small number of semantic primitives into which input 
sentences are mapped (see, e.g., Schank and Riesbeck, 1981). The system relies on 11 primitive 
physical, instrumental and mental ACT’s (propel, grasp, speak, attend, P trans, A trans, etc.), 
plus several other categories or concept types. 

5. Frame — A complex data structure for representing a whole situation, complex object or 
series of events. A frame has slots for objects and relations appropriate to the situation. 

6. Scripts — Frame-like data structures for representing stereotyped sequences of events to aid in 
understanding simple stories. 

H. Syntactic Parsing 

Parsing assigns structures to sentences. The following types have been developed over the years 
for NLP. (Barr and Feigenbaum, 1981). 

I. Template Matching: Most of the early (and some current) NL programs performed parsing by 
matching their input sentences against a series of stored templates. 

2. Transition Nets: 

Phrase structure grammars can be syntactically decomposed using a set of rewrite rules such as 
indicated in Figure III- 1 . Observe that a simple sentence can be rewritten as a Noun Phrase and a 
Verb Phrase as indicated by: 

S— *~NP VP 

The noun phrase can be rewritten by the rule 

NP— ►(DET)(ADJ*)N(PP*) 

where the parentheses indicate that the item is optional, while the asterisks (associated with the 
adjectives and prepositional phrases) indicate that any number of items may occur. 

An example of an analyzed noun phrase is shown in Figures III-2 and III-3. 

As the transition networks analyze a sentence, they can collect information about the word pat- 
terns they recognize and fill slots in a frame associated with each pattern. Thus, they can identify 
noun phrases as singular or plural, whether the nouns refer to persons and if so their gender, 
etc., needed to produce a deep structure. A simple approach to collecting this information is to 
attach subroutines to be called for each transition. A transition network with such subroutines at- 
tached is called an “augmented transition network,” or ATN. With ATN’s, word patterns can be 
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GRAMMAR 

S ► NP VP 

NP ► (DET) (ADJ*) N (PP*) 

PP ► PREP NP 

VP ► VTRAN NP 

Figure III-l. A Transition Network for a Small Subset of English. 

Each diagram represents a rule for finding the corresponding word pattern. Each rule can call on 
other rules to find needed patterns. 

After Graham (1979, p214 .) 
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NP 

— f- , ^ 

The payload on a tether under the shuttle 

DET N PP 

The payload on a tether under the shuttle 


PREP 


NP 


on a tether under the shuttle 


DET N 


PP 


a tether under the shuttle 

PREP NP 
under the shuttle 


DET N 
the shuttle 

Figure III-2. Example Noun Phrase Decomposition. 


NP 



Figure III-3. Parse Tree Representation of the Noun Phrase Surface Structure. 
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recognized. For each word pattern, we can fill slots in a frame. The resulting filled frames provide 
a basis for further processing. 

3. Other Parsers 

Other parsing approaches have been devised, but ATN’s remain the most popular syntactic 
parsers. ATN’s are top-down parsers in that the parsing is directed by an anticipated sentence 
structure. An alternative approach is bottom-up parsing, which examines the input words along 
the string from left to right, building up all possible structures to the left of the current word 
as the parser advances. A bottom-up parser could thus build many partial sentence structures that 
are never used, but the diversity could be an advantage in trying to interpret input word strings 
that are not clearly delineated sentences or contain ungrammatical constructions or unknown 
words. There have been recent attempts to combine the top-down with the bottom-up approach 
for NLP in a similar manner as has been done for Computer Vision. 

For a recent overview of parsing approaches see Slocum (1981). 

I. Semantics, Parsing and Understanding 

The role of syntactic parsing is to construct a parse tree or similar structure of the sentence to 
indicate the grammatical use of the words and how they are related to each other. The role of the 
semantic processing is to establish the meaning of the sentence. This requires facing up to all the 
cantankerous ambiguities discussed earlier. 

Charniak (1981) observes that there have been two main lines of attack on word sense ambi- 
guity. One is the use of discrimination nets (Reiger and Small, 1979) that utilize the syntactic 
parse tree (by observing the grammatical role that the word plays, such as taking a direct object, 
etc.) in helping to decide the word sense. The other approach is based on the frame/script idea 
(used, e.g., for story comprehension) that provides a context and the expected sense of the word 
(see e.g., Schank and Abelson, 1977). 

Charniak indicates that the semantics at the level of the word sense is not the end of the parsing 
process, but what is desired is understanding or comprehension (associated with pragmatics). 
Here the use of frames, scripts and more advanced topics such as plans, goals, and knowledge 
structures (see, e.g. Schank and Riesbeck, 1981) play an important role. 

J. Natural Language Processing (NLP) Systems 

As indicated below, various NLP systems have been developed for a variety of functions. 

1. Kinds 

a. Question Answering Systems 

Question answering natural language systems have perhaps been the most popular of the NLP 
research systems. They have the advantage that they usually utilize a data-base for a limited 
domain and that most of the user discourse is limited to questions. 

b. Natural Language Interfaces (NLI’s) 

These systems are designed to provide a painless means of communicating questions or instruc- 
tions to a complex computer program. 
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c. Computer-Aided Instruction (CAI) 

Arden (1980, p. 465) states: 

One type of interaction that calls for ability in natural languages is the interaction needed for effective teaching 
machines. Advocates of computer-aided instruction have embraced numerous schemes for putting the computer 
to use directly in the educational process. It has long been recognized that the ultimate effectiveness of teaching 
machines is linked to the amount of intelligence embodied in the programs. That is, a more intelligent program 
would be better able to formulate the questions and presentations that are most appropriate at a given point in a 
teaching dialogue, and it would be better equipped to understand a student’s response, even to analyze and 
model the knowledge of the student, in order to tailor the teaching to his needs. 

d. Discourse 

Systems that are designed to understand discourse (extended dialogue) usually employ 
pragmatics. Pragmatic analysis requires a model of the mutual beliefs and knowledge held by the 
speaker and listener. 

e. Text Understanding 

Though Schank (see Schank and Riesbeck, 1981) and others have addressed themselves to this 
problem, much more remains to be done. Techniques for understanding printed text include 
scripts and causative approaches. 

/. Text Generation 

There are two major aspects of text generation: one is the determination of the content and tex- 
tual shape of the message, the second is transforming it into natural language. There are two ap- 
proaches for accomplishing this. The first is indexing into canned text and combining it as ap- 
propriate. The second is generating the text from basic considerations. McDonald’s thesis (1980) 
provides one of the most sophisticated approaches to text generation. 

2. Research NLP Systems 

Until recently, virtually all of the NLP systems generated were of a research nature. These NLP 
systems basically were aimed at serving five functions: 

a. Interfaces to Computer Programs 

b. Data Base Retrieval 

c. Text Understanding 

d. Text Generation 

e. Machine Translation 

Gevarter (1983) includes a survey of research NLP systems. 

3. Commercial Systems: 

The commercial systems available today (together with their approximate price) are listed in 
Table III-2. Several of these systems are derivatives of past research NLP systems. 


50 



TABLE III-2. Some Commercial Natural Language Systems. 


System 

INTELLECT 
(Derivative of ROBOT) 
$50K/System 
(also distributed as 
ON-LINE ENGLISH 
and GRS Executive) 


PEARL (Based on 
SAM and PAM) 
$250K/system 


Straight Talk 
(Derivative of LIFER) 
$660 


SAVVY 

$950 


Organization 

Artificial Intelligence Corp. 
Waltham, Mass 


(Culliane) 

(Information Sciences) 


Cognitive Systems 
New Haven, Conn 


Dictaphone, 

Written by Symantec 
Sunnyvale, CA 


SAVVY Marketing Inter- 
national 
Sunnyvale, CA 


Purpose 

NLI for Data Base 
Retrieval 

(Other extensions 
underway) 


Custom NLI’s 

The first system— 
Explorer — is an inter- 
face to an existing 
map generating system. 
Others are interfaces 
to data bases. 

Highly portable NLI 
for DBMS for micro- 
computers. 


System Interface 
for micro-computers 


Comments 

Several hundred systems sold 

Takes about 2 weeks to 
implement for a new data 
base. 

Written in PL-1 

Available for mainframes 

Large start-up cost in build- 
ing the knowledge base. 

Several systems have been, 
and are being, built. 

Written in LISP 


Written in PASCAL. 
Designed to be very compact 
and efficient. Available 
about Nov. 1983. 

User customized. 

Not linguistic. Uses adaptive 
(best fit) pattern matching 
to strings of characters. 

Released 3/82 

User customized 


Weidner System 


$16K/ language 
direction 


ALPS 


Weidner Communications Semi-Automatic 

Corp. Provo, UT Natural Language 

Translation. 


• Linguistic approach. Written in 
FORTRAN IV. 

• Translation with human editing 
is approximately 100 words/hr 
(up to eight times as fast as 
human alone). 

• Approx. 20 sold by end of 1982, 
mainly to large multi-national 
corporations. 


ALPS 
Provo, UT 


Interactive Natural • Linguistic Approach 

Language Translation 

• Uses a dictionary that provides 
the various translations for 
technical words as a display to 
human translator, who then 
selects among the displayed 
words. 
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TABLE III-2. Some Commercial Natural Language Systems (cont.) 


System 

Organization 

Purpose 

Comments 

NLMENU 

Texas Instruments, Inc. 

NLI to Relational 

• Menu Driven NL Query System 


Dallas, TX 

Data Bases 


• All queries constructed from menu 
fall within linguistic and conceptual 
coverage of the system. Therefore, 
all queries entered are successful. 

• Grammars used are semantic 
grammars written in a context-free 
grammar formalism. 

• Producing an interface to any 
arbitrary set of relations is automated 
and only requires a 15-30 minute 
interaction with someone knowledge- 
able about the relations in question. 

• System will be available late in 
1983 as a software package for a 
micro-computer. 


K. State of the Art 

It is now feasible to use computers to deal with natural language input in highly restricted con- 
texts. However, interacting with people in a facile manner is still far off, requiring understanding 
of where people are coming from — their knowledge, goals and moods. 

In today’s computing environment, the only systems that perform robustly and efficiently are 
Type A systems — those that do not use explicit world models, but depend on key word or pat- 
tern matching and/or semantic grammars. In actual working systems, both understanding and 
text generation, ATN-like grammars can be considered the state of the art. 

L. Principal U.S. Participants in NLP 

1. Research and Development* 

Non-Profit 

SRI 

MITRE 


*A review of current research in NLP is given in Kaplan (1982). 
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Universities 


Yale U. — Dept, of Computer Science 

U. of CA, Berkeley — Computer Science Div., Dept, of EECS. 
Carnegie-Mellon U. — Dept, of Computer Science 
U. of Illinois, Urbana — Coordinated Science Lab. 

Brown U. — Dept, of Computer Science 
Stanford U. — Computer Science Dept. 

U. of Rochester — Computer Science Dept. 

U. of Mass., Amherst — Department of Computer and Information Science 
SUNY, Stoneybrook, Dept, of Computer Science 
U. of CA, Irvine, Computer Science Dept. 

U of PA — Dept, of Computer and Infor. Science 

GA Institute of Technology — School of Infor. and Computer Science 

USC — Infor. Science Institute. 

MIT — AI Lab. 

NYU — Computer Science Dept, and Linguistic String Project 
U. of Texas at Austin — Dept, of Computer Science 
Cal. Inst, of Tech. 

Brigham Young U. — Linguistics Dept. 

Duke U. — Dept, of Computer Science 
N. Carolina State — Dept, of Computer Science 
Oregon State U. — Dept, of Computer Science 
Purdue U. 

Industrial 

BBN 

TRW Defense Systems 
IBM, Yorktown Heights, N.Y. 

Burroughs 
Sperry Uni vac 

Systems Development Corp., Santa Monica 

Hewlett Packard 

Martin Marietta, Denver 

Texas Instruments, Dallas 

Xerox PARC 

Bell Labs 

Institute of Scientific Information, Phila., PA 
GM Research labs, Warren, MI 
Honeywell 
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2. Principal U.S. Government Agencies Funding NLP Research 
ONR (Office of Naval Research) 

NSF (National Science Foundation) 

DARPA (Defense Advanced Research Projects Agency) 

3. Commercial NLP Systems 

Artificial Intelligence Corp., Waltham, Mass. 

Cognitive Sytems Inc., New Haven, Conn. 

Symantec, Sunnyvale, CA 
Texas Instruments, Dallas, TX 
Weidner Communications, Inc., Provo, Utah 
Savvy Marketing International, Sunnyvale, CA 
ALPS, Provo, UT 


4. Non- U.S. 

U. of Manchester, England 

Kyoto U., Japan 

Siemens, Corp., Germany 

U. of Strathclyde, Scotland 

Centre National de la Recherche Scientifique, Paris 

U. di Udine, Italy 

U. of Cambridge, England 

Phillips Res. Labs, The Netherlands 

M. Forecast 

Commercial natural language interfaces (NLI’s) to computer programs and data base manage- 
ment systems are now becoming available. The imminent advent of NLI’s for micro-computers is 
the precursor for eventually making it possible for virtually anyone to have direct access to 
powerful computational systems. 

As the cost of computing has continued to fall, but the cost of programming hasn’t, it has 
already become cheaper in some applications to create NLI systems (that utilize subsets of 
English) than to train people in formal programming languages. 

Computational linguists and workers in related fields are devoting considerable attention to the 
problems of NLP systems that understand the goals and beliefs of the individual communicators. 
Though progress has been made, and feasibility has been demonstrated, more than a decade will 
be required before useful systems with these capabilities will become available. 

One of the problems, in implementing new installations of NLP systems, is gathering informa- 
tion about the applicable vocabulary and the logical structure of the associated data bases. Work 
is now underway to develop tools to help automate this task. Such tools should be available 
within 5 years. 

For text understanding, experimental programs have been developed that “skim” stylized text 
such as short disaster stories in newspapers (DeJong, 1982). Despite the practical problems of suf- 
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ficient world knowledge and the extension of language required, practical tools emerging from 
these efforts should be available to provide assistance to humans doing text understanding within 
this decade. 

The NRL Computational Linguistic Workshop (1981) concluded that text generation tech- 
niques are maturing rapidly and new application possibilities will appear within the next five 
years. 

The NRL workshop also indicated that: 

Machine aids for human translators appear to have a brighter prospect for immediate application than fully 
automatic translation; however, the Canadian French-English weather bulletin project is a fully automatic 
system in which only 20% of the translated sentences require minor rewording before public release. An am- 
bitious common market project involving machine translation among six European languages is scheduled to 
begin shortly. Sixty people will be involved in that undertaking which will be one of the largest projects under- 
taken in computational linguistics.* The panel was divided in its forecast on the five year perspective of machine 
translation but the majority were very optimistic. 

Nippon Telegram and Telephone Corp. in Tokyo has a machine translation AI project under- 
way. An experimental system for translating from Japanese to English and vice versa is now being 
demonstrated. In addition, the recently initiated Japanese Fifth Generation Computer effort has 
computer-based natural language understanding as one of its major goals. 

In summary, natural language interfaces using a limited subset of English are now becoming 
available. Hundreds of specialized systems are already in operation. Major efforts in text 
understanding and machine translation are underway, and useful (though limited) systems will be 
available within the next five years. Systems that are heavily knowledge-based and handle more 
complete sets of English should be available within this decade. However, systems that can handle 
unrestricted natural discourse and understand the motivation of the communicators remain a dis- 
tant goal, probably requiring more than a decade before useful systems appear. 
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IV. SPEECH RECOGNITION AND SPEECH UNDERSTANDING 


A. Introduction 

Speech is our fastest means of discourse communication, being about twice as fast as the 
average typist. It is also nearly effortless: speech doesn’t need visual or physical contact and it 
places few restrictions on the use of the hands or the mobility of the body. Speech is thus well 
suited to communication with a machine when the individual is engaged in other activities. Its ef- 
fortlessness also makes it desirable for operating a computer, and it is a long term candidate for 
direct text preparation (automatic dictation). 

Speech understanding systems have all the difficulties of natural language understanding plus 
the problem of interpreting the speech signal with all its noise and variability. As a result, speech 
understanding is one of the most difficult AI subjects, being a perception task related to the scene 
understanding problem in computer vision. Though the constraining aspects of natural language 
help reduce the magnitude of the task, it remains a major problem area. 

Speech systems can be categorized into speech recognition systems and speech understanding 
systems, the former task being considerably easier. In addition, the systems further divide into 
those that work with isolated words and those that can handle connected speech, the latter being 
perhaps an order of magnitude more difficult than the former. 

Finally, speech systems are also classified as speaker dependent and speaker independent. The 
former systems must be trained to recognize the particular speakers using it. 

The heart of the speech problem (that gives rise to the above classifications) is the difficulty of 
recognizing the speech signal, but before we explore that area, let us briefly look at applications 
for speech devices. 

B. Applications 

There are many applications emerging for speech recognition and speech understanding 
systems. Some of these are listed in Tables IV-1 and IV-2. 

C. The Nature of Speech Sounds: 

It is beginning to be realized that acoustics and phonetics may be the key to speech under- 
standing. Zue (1981) argues that human spectrograph-reading experiments indicate that phonetic 
recognition in speech systems can be improved substantially, which would result in much more 
capable speech systems. 

Speech recognition is based primarily on the identification of words. An adult speaker may 
know 100,000 of the 300,000 words in the English language. Each language has a basic set of 
speech sounds called phonemes. In English there are only about 40 phonemes, compared with 
some 10,000 for the next largest speech unit, the syllable. 

The sounds that make up human speech are generated by the flow of air through the vocal tract 
in three ways (Levinson and Liberman, 1981): 
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TABLE IV-1. Speech Recognition Applications. 


Manufacturing Processes and Control 

• Quality control data entry into computers 

• Shipping and receiving — record entry, package sorting 

• Maintenance and repair orders — part 
availability, work needed or under way. 

• CAD/CAM 

Office Automation 

• Executive work station 

• Word processing 

• Data entry 

• Control functions 

Technical Data Gathering 

• Cartography — inputs when working with maps. 

• Working with blueprints 

• Medical applications: 

Dental records 
Pathology 

Services for the handicapped 

Operating room logging 

Command/control of medical instrumentation 

Security Applications 

• Building access 

• Computer file access 

• Communications security 

• Speaker verification/identification 

Consumer Products Applications 

• Control functions 

• Status queries 

Equipment Subsystem Operation 

• Aircraft 

• Spacecraft 

• Military equipment 
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TABLE IV-2. Speech Understanding Applications 


• Universal access to large data bases via the telephone network. 

• Automatic telephone transaction systems — Airline reservations and inquiries. 

• Command and Control 

— Military 
— Business 

• Operation of complex machines. 


1. The vocal cords can be made to vibrate, resulting in the frequency of the sound referred to 
as pitch. 

2. A constriction can be formed in the vocal tract, narrow enough to cause turbulence, 
resulting in noise-like sounds, like that used to produce “f”. 

3. Pressure built up behind a closure (such as the lips) can release a burst of acoustic energy as 
in the pronunciation of consonants, such as “p”, “t” and “k”. 

These three sources of speech sound are shaped acoustically by the time-varying physical shape 
of the vocal tract. 

One way to characterize the speech signal is by its Fourier transform, which specifies the 
amplitude and phase of each of the frequencies in the Fourier series of the signal. As the phase 
makes little perceptual difference, the signal is represented in practice by its amplitude spectrum, 
in a representation called a spectograph. 

D. Isolated Word Recognition 

Figure IV-1 indicates a basic paradigm for speech recognition. The signal is first operated upon 
to emphasize the 2 to 3 kHz frequency range, filtered to chop off high frequencies (>8 kHz), 
then digitized. The end points of the word are detected, and a set of parameters representing the 
word are generated. This is then matched with stored parameter sets in the system’s vocabulary, 
and the word with the closest match chosen. For a word, the acoustic signal varies both in dura- 
tion and amplitude each time the same speaker says it. Thus it may have to be warped to achieve 
the best comparison with the reference — this task being one of the toughest problems for a 
speech recognizer. The warping is usually accomplished by dynamic programming. 

Doddington and Schalk (1981, p. 28) state that: 

The most common means of feature extraction is direct measurement of spectrum amplitude, with, for example, 
a set of 16 bandpass filters. Another means is measurement of the zero-crossing rate of the signal in several broad 
frequency bands to give an estimate of the formant [resonant] frequencies in these bands. Yet another means is 
representing the speech signal in terms of the parameters of a filter whose spectrum best fits that of the input 
speech signal. This technique known as linear predictive coding (LPC) has gained popularity because it is 
efficient, accurate, and simple. 
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After Zue (1982, p51) 


Figure IV-1. Basic Speech-Recognition Paradigm 




E. Recognizing Continuous Speech 

For continuous speech, rather than attempting to match all possible word patterns, it is often 
more efficient to work with speech units much smaller than words, particularly phonemes. Break- 
ing down the speech signal into these smaller components and giving them symbols, is referred to 
as segmentation and labeling. Usually, several phoneme labels are assigned to each segment by a 
pattern-matching process, which also assigns a probability value representing the goodness of the 
match. With the appropriate acoustic-phonetic knowledge, it is possible to combine, regroup, 
and delete segments to form larger phoneme units. The lexical knowledge of word pronunciation 
can now be used to generate a multiplicity of word hypotheses. For a sufficiently limited 
vocabulary, and perhaps also employing some syntactic and word boundary knowledge, speech 
recognition can be achieved. 

F. Speech Understanding 

Arden (1980, pp. 475, 478) observes that: 

Speech-understanding systems differ somewhat from recognition systems, in that they have access to and make 
effective use of task-specific knowledge in the analysis and interpretation of speech. Further, the criteria for per- 
formance are somewhat relaxed, in that the errors that count are not the errors in speech recognition, but errors 
in task accomplishment. 

To successfully decode the unknown utterance, a speech perception system must effectively use the many diverse 
sources of knowledge about the language, the environment, and the context. These sources of knowledge include 
the characteristics of speech sounds (acoustic-phonetic), variability in pronunciation (phonology), the stress and 
intonation patterns of speech (prosodies), the sound patterns of words and sentences (lexicon), the grammatical 
structure of language (syntax), the meaning of words and sentences (semantics), and the context of the conversa- 
tion (pragmatics) . . . 

What makes speech perception a challenging and difficult area of A. I. is the fact that error and ambiguity 
permeate all the levels of the speech-decoding process .... 

The grammatical structure of sentences can be viewed principally as a mechanism for reducing search by restrict- 
ing the number of acceptable alternatives .... 

Barr and Feigenbaum (1981, p. 332) note that the types of knowledge at the various levels in 
processing spoken knowledge include (from the signal level up): 

1 . Phonetics — representations of the physical characteristics of the sounds in all of the words in the vocabulary. 

2. Phonemics — rules describing variations in pronunciation that appear when words are spoken together in 
sentences (coarticulation across word boundaries, “swallowing” of syllables, etc.); 

3. Morphemics — rules describing how morphemes (units of meaning) are combined to form words (formation 
of plurals, conjugations of verbs, etc.); 

4. Prosodies — rules describing fluctuation in stress and intonation across a sentence; 

5. Syntax — the grammar or rules of sentence formation resulting in important constraints on the number of 
sentences (not all combinations of words in the vocabulary are legal sentences); 

6. Semantics — the “meaning” of words and sentences, which can also be viewed as a constraint on the speech 
understander (not all grammatically legal sentences have a meaning — e.g., The snow was loud); and, finally, 

7. Pragmatics — rules of conversation (in a dialogue, a speaker’s response must not only be a meaningful 
sentence but also be a reasonable reply to what was said to him). For instance, it is pragmatic knowledge that 
tells us that the question “Can you tell me what time it is?” requires more than just a Yes or No response. 

Using this knowledge, the hierarchical structure leading to speech understanding can be 
characterized as shown in Figure IV-2. 
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Figure IV-2. The Processing Hierarchy in Speech Understanding 
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G. The ARPA Speech Understanding Research (SUR) Project 
/. Introduction 

In 1971, ARPA (The Advanced Research Projects Agency) initiated a five year speech 
understanding research effort that proved to be one of the most significant projects in AI history. 
Not only did it greatly advance our knowledge of speech, but it also provided new insights on how 
to structure and control a complex “expert system.” 

Lea and Shoup (1979) reported that the ARPA SUR project had the highly ambitious goals of 
understanding, with 90% accuracy, continuous speech from a 1000 word vocabulary spoken by 
several cooperative speakers under near ideal conditions of quiet rooms and high-fidelity equip- 
ment. It was intended that the processing take no more than several times real-time using large 
very fast computers. 

There were three principal complete systems developed under the project — HEARSAY II and 
HARPY at Carnegie Mellon University (CMU), and HWIM (Hear What I Mean) at Bolt, 
Berenek and Newman (BBN). In 1976, the ARPA goals were essentially met at CMU by HARPY 
exhibiting a 95% accuracy and HEARSAY II achieving a 90% accuracy. HWIM had a substan- 
tially lower accuracy, but utilized a more difficult vocabulary. (HWIM’s domain was Travel 
Budget Management. HEARSAY II’s and HARPY’s was Retrieval of AI Documents.) These 
three systems were heavily knowledge-based and are now considered to be expert systems. 

All the ARPA SUR systems utilized a combination of bottom-up and top-down processing. 
The lower levels used knowledge about the variable phonetic composition of the words in the 
vocabulary (lexicon) to interpret pieces of the speech signal by comparing it with prestored pat- 
terns. The top level aided in recognition by building expectations about which words the speaker 
was likely to say, using syntactic and semantic constraints (Barr and Feigenbaum, 1982, 
pp. 326-327). 


2. HEARSAY II 

HEARSAY II is characterized by its cooperative problem-solving system architecture (see 
Figure IV-3) which employs a set of programmed “specialists” (Knowledge Sources: KS’s) inter- 
acting via a shared common blackboard on which their decisions were recorded. The blackboard 
can be visualized as a global data structure representing a multi-level network of alternative 
hypotheses. 

HEARSAY has a total of 12 KS’s, which at the lower levels created syllable class hypotheses 
from segments, word hypotheses from syllables, etc. At the higher levels, KS’s acted to: predict 
all possible words that might syntactically precede or follow a phrase, create phrase hypotheses 
from verified contiguous word-phrase pairs, etc. 

The majority of the hypotheses contributed by the KS’s at any level did not end up in the final 
interpretation of the sentence. Instead, only the most likely hypotheses were chosen for expan- 
sion. The individual KS’s operated somewhat independently and asynchronously through 
pattern-invoked programs when matching patterns appeared on the blackboard. To economize 
on computing resources, each hypothesis was rated and (using an appropriate scheduling routine) 
the most likely patterns were expanded first. 
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Figure IV-3. A Block Diagram of the CMU Hearsay-II System Organization 
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3. HARPY 

A crude way of thinking of HARPY is as a compiled version of HEARSAY II. HARPY uses a 
single precompiled network knowledge structure. Barr and Feigenbaum (1981, p. 349) report: 

The network contains knowledge at all levels: acoustic, phonemic, lexical, syntactic, and semantic. It stores 
acoustic representations of every possible pronunciation of the words in all of the sentences that HARPY 
recognizes. The alternative sentences are represented as paths through the network, and each node in the network 
is a template of allophones (distinctive variations of phonemes, dependent on adjacent phonemes). 

The paths through the network can be thought of as “sentence templates,” much like the word templates used in 
isolated-word recognition. 

HARPY uses a heuristic method called “beam search” for searching for the sentence in the 
network that most closely matches the input signal. HARPY proceeds from left to right through 
the network, matching spoken sounds to allophonic states; and assigning scores based on the 
goodness of the match. HARPY keeps the paths with the best cumulative scores, pruning away 
others which fall some threshold amount below the best scoring path (Erman et al, 1980). 

4. HWIM 

The HWIM (Hear What I Mean) speech understanding system was developed at BBN. 
HWIM’s domain was that of travel budget management. HWIM’s organization is shown in 
Figure IV-4. The lower components digitize the speech signal and generate a parametric represen- 
tation of it, which is then segmented and labeled into phonemes which are ranked as to the quality 
of their match. These ranked phonemes are pictured as a segmented lattice, which is a graph that 
is divided into time segments and read from left to right. This graph is matched against a dic- 
tionary of work pronunciations (stored as a network with phonemes for nodes) by lexical retrieval 
components which generate word hypotheses. 

HWIM’s higher levels include information about trips (semantics), syntax and word verifica- 
tion. The verification component takes the pronunciation of hypothesized words and generates a 
synthesized parameter representation that is compared to the parameters generated from the ac- 
tual signal. 

HWIM has a central control which uses the system’s knowledge sources as subroutines. The 
system extends bottom-up theories using the top-down syntactic and semantic components. The 
system expands its hypotheses about the first recognized word in the sentence. 

5. Summary of the ARP A SUR Program 

ARPA’s program did not result in a useable speech understanding system. The resulting 
systems were too slow, too restricted and required large computational resources. However, it did 
discover and elucidate much new information about speech, and developed new architectural in- 
sights, particularly the blackboard architecture that has since been used in other AI systems. Per- 
formances of the different systems were difficult to compare because of the different vocabularies 
and domains employed. One critical factor in comparison is the average branching factor (ABF). 
This refers to the average number of words that might come next after each work in a legal 
sentence. Table IV-3 summarizes the three major ARPA SUR projects. Note that the ABF is 196 
for HWIM’s database retrieval task, versus 33 for HEARSAY’S and HARPY’s document 
retrieval task. 
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Figure IV-4. Block Diagram of the BBN HWIM System Organization 
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TABLE IV-3. Summary of ARPA ’s Speech Understanding Systems 


Name/Org 

Domain/ 

Purpose 

Approach 

Knowledge 

Rep. 

Control ABF 

Accuracy 

Comments 

HEARSAY II 
CMU 

AI Publications 

Document 

Retrieval 

Utilizes cooper- 
ating independent 
system experts 
(Knowledge 
Sources) that 
communicate via 
posting hypotheses 
on a blackboard. 

Independent 
KS's composed 
of production 
rules. 

• Asynchronous 33 
pattern-invoked 
knowledge 
sources 

• Opportunistic 
scheduling by 
first expanding 
the highest 

90% 

• Development 
of blackboard 
architecture 
and use of 
independent 
cooperating 
knowledge 
sources most 
significant 





scoring 

hypothesis 


• A parallel 


processor 
version has 
been built to 
exploit KS 


modularity. 

HARPY 

AI Publications 

Compiled a 

Precompiled “Beam Search” 33 

95% 

• Approach 



network of all 

Network. Each 


cannot easily 



possible 

node is a tern- No backtracking 


accommodate 

CMU 

Document 

pronunciations of 

plate of allo- 


pragmatics. 


Retrieval 

all possible 

phones, which 





sentences. 

when linked 


• Needs a 



Paths thru 

form acoustic 


large memory. 



network are 

representations 





“sentence tem- 

of every 


• Sensitive to 



plates.” 

possible pronun- 


missing acous- 




ciation of words 


tical segments 




in the domain. 


and missing 
words. 



TABLE IV-3. Summary of ARP A ’s Speech Understanding Systems (cont.) 


Name/Org 

Domain/ 


Knowledge 




Purpose 

Approach 

Rep. 

Control ABF 

Accuracy 

Comments 

HWIM 

Travel 

• The system 

Uses networks 

• Centralized 196 

44% 

• Speaker 

BBN 

Budget 

extends bottom- 

to represent: 

control using 


Independent 

Management 

up word 

1) trip facts and 

KS’s as sub- 




theories, using 

relations. 

routines. 


• Very slow 



top-down syn- 

2) Lexicon 





tactic and 

3) Phoneme 

• Expands sen- 


• Most difficult 



semantic 

hypotheses 

tences about 


domain in 



components. 

from signal 

the first recog- 


SUR project. 



Verifies hypo- 


nized word in 




thesized words 


sentence 





by generating a 
parameter rep- 
resentation that 
is compared 
with that from 
the actual 
speech input. 


(Island Driving). 




• Uses an ATN 
semantic 
grammar. 



H. State of the Art 

I. Speech Recognition 

Table IV-4 is a summary of a recent Texas Instruments’ study of commercial speech 
recognizers tested on a 20 word vocabulary consisting of the 10 spoken digits “zero” thru “nine” 
and ten command words: start, stop, yes, no, go, help, erase, rubout, repeat and enter. 

In 1982, speaker-dependent connected-word short-string, small vocabulary (approx. 50 words) 
recognizers were commercially available. These could recognize up to 90 wpm of connected 
speech compared to a typical person’s speaking rate of 150 wpm. The vocabulary size is usually 
less than 150 words, but is application dependent. Recognition accuracies of 98976 or greater are 
being achieved in factory environments. Current turnkey systems are in the $5K to $75K range. 
Consumer product speech-recognizer subsystems for toys, personal computers, voice-controlled 
appliances, etc., cost from $6 to $100. 

Voice recognition systems are here, viable, proven, but still somewhat costly. In industry ap- 
plications, they have demonstrated large increases in productivity. Hundreds of successful in- 
stallations exist today. Plohar (1983) discusses the human factor considerations associated with 
successful applications. 

2. Speech Understanding 

There are no commercial true speech-understanding systems today. However, there are a 
number of U.S. companies working on future commercial systems. 

a. Bell Labs 

Has been working on a semantic sentence recognizer and interpreter utilizing a finite state 
grammar and a small vocabulary. The intent is to produce an interactive speech understanding 
system for use over the telephone (Levinson and Liberman, 1981). 

b. IBM — T.J. Watson Res. Center 

IBM has had the largest effort in continuous-speech recognition and understanding, capital- 
izing on the HARPY “Beam Search” approach. 

c. Other organizations involved in developing speech understanding systems include BBN. 

I. Who Is Doing Speech Recognition Related Work 

1. Commercial Organizations 
IBM 
TI 

Bell Labs 
Verbex 

Nippon Electric 

Threshold Technology 

Interstate Electronics 

Matsushita 

Scott Instruments 

Sanyo 

INTEL 

ITT (San Diego) 

Fairchild 
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TABLE 1V-4. T.I. ’s Test of Speech Recognizers on Individual Words 


-0 

o 


Manufacturer 

Verbex 

Nippon Electric 
Threshold Technology 
Interstate Electronics 
Heuristics 
Centigram 
Scott Instruments 


Model* 

1800 

DP-100 

T-500 

VRM 

7000 

MIKE 4725 
VET/1 

(home computer peripheral) 


♦First two systems are capable of connected speech. 

Verbex is the only system having speaker-independent capability, 

After Doddington and Schalk (1981). 


Nominal Price 
in 1981 

$65K 
$65K 
$12K 
$2.4K 
$3.3K 
S3.5K 
$ .5K 


Nominal Price 
for Comparable 
1983 Model 

$19. 6K 

$27K 

$5K 

$2.4K 

NA 

NA 

$ ,9K 


% Substitutions 

0.2 

1.2 

1.4 

2.9 

5.9 
7.1 

12.6 



Hewlett Packard 
Haskins Lab 
Lincoln Labs 

Speech Communications Research Lab 

Sperry Univac 

Votan 

Voice Machine Communications 
Voice Processing Corp. 

General Instrument — Milton Bradley 
Voice Control Systems 

2. Universities 
M.I.T. 

C.M.U. 

V.P.I. 

U of CA at Berkeley 

J. Problems and Issues* 

• Speech perception at the acoustic level is a critical factor in achieving advanced recognition 
capability. Current commercial word recognizers have not yet made full use of available 
knowledge. 

• Widespread use of speech recognizers await the availability of low cost connected-speech 
systems achieving better than a 99% accuracy with limited vocabularies — 100 words. 

• Capabilities of a word recognizer depend on: 

(1) Can it recognize connected speech? 

(2) Is it speaker independent? 

(3) How big a vocabulary can it recognize? 

• The greatest difficulty that speech recognizers have is determining word end-points — the 
source of many word-recognition errors for isolated word recognizers. 

• A major problem is separating linguistically significant variations in the speech signal from 
insignificant variations (such as variations in word pronunciations). 

• Noise is also a major problem in speech recognition, often resulting from actions of the 
speaker himself. 

• Large vocabulary size is a problem to users, who need to remember what the machine can 
recognize. 

• The two main errors made by speech recognizers are: 

(1) Substitution, and 

(2) Rejection 

• Other less common errors are insertion and deletion. 

• There are as yet no standards for test or evaluation of systems — a major problem. 

"These have been gleaned primarily from Doddington and Schalk (1981). 
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• It is not the number of words that are the major difficulty, it is how close their sound is to 
each other. The natural English alphabet is a particularly difficult set of sounds to 
distinguish. 

• Software or hardware may also have idiosyncracies that adversely affect recognition per- 
formance. 

• As recognizer performance improves, evaluation becomes more difficult, because more 
testing is required to achieve statistical significance. 

• The pronunciation of individual words change depending on the adjacent words in the 
sentence. 

• The hypothesize and test approach needs abundant computer power — a major factor 
limiting its commercial use. 

• Integrating recognizers into an application requires substantial software and human factors 
considerations. This has limited real-world adoption. 

K. Future Trends 

It is anticipated that speaker-independent, continuous-speech recognition systems with limited 
vocabularies (10-20 words), having an accuracy of 98% or better, will be available by the 
mid-1980’s. Automatic dictation will probably not appear before the 1990’s, with Japanese 
language systems being the first to appear. (Japanese language has only on the order of 500 
syllables, compared to 10K for English.) Speech understanding is a major part of the Japanese 
5th Generation Computer Project (Feigenbaum and McCorduck, 1983). 

Due to the advancement in VLSI, it is expected that voice recognition chips for toys will soon 
be in the $6 range — $50 for a complete system. 

A strong expectation is that a speech understanding system using a natural language parser will 
be introduced by IBM in the mid-80’s. 

Around 1990, true commercial speech understanding systems, having the capabilities of the 
ARPA SUR systems but operating in near real-time, are expected to appear. 

By 1990, speech recognition and understanding is expected to be a billion dollar a year industry 
(Elphick, 1982). 
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V. SPEECH SYNTHESIS 


A. Introduction 

Speech synthesis — speech output from a computer — is an emerging technology whose pro- 
ducts are already becoming commonplace. Though the present market for these devices is still 
small, the future looks very bright. 

Speech synthesis is not normally considered an AI topic, though it is sure to play an important 
part in many future AI systems, particularly when coupled with speech understanding. One may 
very well consider these synthesis systems, which employ rules (often heuristic) for deriving 
speech from stored speech elements, as an example of an “expert system on a chip.” 

B. Why Synthesis 

One approach to making available speech when needed is to record the speech and play it back 
as required. The disadvantage is that mechanical devices are often unreliable and the ability to 
generate new sentences from stored words is quite limited because of access time, and therefore 
unsuitable for most computer-based applications. 

A more reliable approach is to use digital sound recording techniques, enabling speech to be 
stored in solid-state memories having no moving parts to break down. The disadvantage is that an 
enormous amount of storage is required — in the order of 50,000 bits per second of digital speech 
(at the typical speaking rate of 150 words per minute). However, if words are represented by the 
digital code for their letters, the same information requires only about 100 bits per second of 
speech. This two to three orders of magnitude difference highlights the importance of speech 
compression for any digital representation of speech, not only to save storage requirements, but 
also to vastly reduce the bandwidth required for electronic speech transmission. All speech syn- 
thesis methods use some form of speech compression. 

Speech synthesis serves three basic purposes: 

1) Recreating speech from a compressed speech representation 

2) Generating speech from stored speech elements such as by concatenating representations for 
words, and 

3) Generating speech from text. 

The first purpose is associated with minimizing storage or transmission bandwidth re- 
quirements. The second with creating speech from stored components undermicroprocessor or 
computer control. The third with reading machines and computer-human interaction. 

An indication of applications of speech synthesis is given in Table V-l. 


C. Human Speech 

As many speech synthesizers actually employ an approximate simulation of the human speech 
production mechanism, it is helpful to briefly review human speech and its generation. Human 
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TABLE V-l. Applications of Speech Synthesis. 


Military 

• Operation of military equipment 

• Warnings 

• Reminders 

• Service and operation aids 

• Trainers and simulators 

• Secure communications 

Computer 

• Communication by computers to users. 

Consumer 

• Talking appliances 

• Teaching devices 

• Toys 

• Talking typewriters and calculators 

• Talking watches 

• Automobile warning devices, reminders, and annunciators for instruments 

• Devices for the blind 

• Communication for the speech handicapped 

Telecommunications 

• Synthesized telephone messages 

• Speech compression for “store and forward,” to reduce communication costs 

• Vocal delivery of electronic mail 

Industrial 

• Speaking instruments 

• Speaking cash registers 

• Alarm systems 

• Automated office equipment 

• Industrial process control 

• Station and floor announcers for trains, buses, elevators, etc. 

• Systems operations where the operators have their visual attention elsewhere 

• Emergency warning devices for airplanes, machines, etc. 

• Control room annunciators for sensors 

• Text readers 

• Data entry (with vocal verification) 
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speech consists basically of a combination of vocal sounds such as vowels, fricative sounds— such 
as f, th or sh, and plosive or stop consonant sounds such as b and d. 

The human vocal tract can be considered as an acoustic tube terminated at one end by the vocal 
cords and at the other end by the lips. This resonant tube has a side branch — the nasal 
resonator — separated by a flap called the velum. 

Voiced sounds are produced by forcing air from the lungs past the tensed vocal cords which 
are thus forced to vibrate, emitting puffs of air into the vocal tract. (The puff frequency — about 
100 hertz in males, 200 hertz in females — is a function of the vocal cord size and tenseness.) 
These puffs of air excite the vocal tract, stimulating their resonant (formant) frequencies. Most of 
the resulting sound energy is contained in these resonant responses, the frequency of which can be 
varied by changing the shape of the vocal tract by moving the lips, jaw or tongue. 

Fricative sounds occur when a constriction in the vocal tract leads to turbulent air flows after 
the constriction. 

Plosives are generated by briefly closing the vocal tract until pressure builds up and then releas- 
ing the pressure. 

D. Electronic Simulation of the Speech Mechanism 

The three basic human speech sounds can be electronically simulated as follows — as illustrated 
by the Computalker Consultants Model CT-1* synthesizer shown schematically in Figure V-l. 

Voiced sounds can be simulated by passing energy from a variable periodic source — cor- 
responding to the vocal cord puffs — through a series of variable filters (f,. f 2 , f 3 ) corresponding 
to the vocal tract resonances (formants). Plosive sounds are produced the same way, but require 
rapid changes in the amplitude parameters A 0 and A n . Fricative sounds are produced by passing 
white noise through a variable filter (f f ). Some sounds, such as v and z, are produced using both 
the periodic and noise mechanisms. 

Using this approach, human speech can be simulated by controlling the frequency parameters 
(fj) and the amplitude parameters (Aj) over time. Some variant of this basic method — referred to 
as parametric coding — is used in all speech synthesizers that simulate human speech production. 

E. Synthesis in Speech Compression and Regeneration 

Synthesis has the role of regeneration in speech compression schemes (associated with speech 
storage or minimal bandwidth speech transmission). 

There are two basic speech compression techniques — frequency domain analysis (parametric 
coding as discussed in the previous section on electronic simulation), and time domain analysis. 
Frequency domain methods tend to dominate commercial speech synthesis, but time domain 
analysis has become important for limited-vocabulary word synthesis. 

The frequency domain approach analyzes the incoming speech to be compressed and generates 
the parameters needed for regenerating the signal using an electronic simulation of the vocal 
tract. In some cases, these parameters may be further compressed for reduced storage. Speech is 
generated by inverting the process as indicated in Figure V-2. 

Time domain analysis is characterized by waveform compression techniques. Waveform 
digitization coding, researched extensively by Bell Labs, takes the original waveform of spoken 

♦No longer in production, but the Phillips speech chip essentially does the same thing. 
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Figure V-l. A Simplified Diagram of the Computalker CT-1 Parametric Synthesizer. 


A. RECORDING 



B. PLAYBACK 



After Sherwood (1979) 

Figure V-2. Recording and Reproduction of Speech Using a Compressed-Speech System. 
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words and compresses them using a complicated algorithm. The final compressed waveform is 
stored as bits in memory for later reconstruction of the original waveform. Though generally pro- 
ducing better sounding speech than parametric coding, waveform digitization coding requires two 
to four times as much storage as that needed for parametric coding. 

F. Parametric Coding Schemes 

1. Introduction: 

All frequency domain compression techniques employ some sort of electronic model of the 
human vocal tract. Thus, all have one or more filters to simulate vocal tract resonances, and 
periodic and noise energy sources, and are controlled by varying the parameters associated with 
pitch, loudness, and filter frequencies. 

2. Formant Coding 

This is a straightforward approach to controlling an electronic model of the vocal tract by con- 
trolling the tunable filters using parametric signals that represent the formant (vocal tube reso- 
nant) frequencies such as those shown in Figure V-l. As the formant frequencies change relatively 
slowly, the parameters need to be updated relatively infrequently, thus allowing data compres- 
sion. 

3. Linear Predictive Coding (LPC) 

LPC, pioneered by TI for “Speak and Spell,” is a form of formant coding which allows fur- 
ther compression of the parameters. As the formant frequencies tend to change slowly, current 
samples are predicted from weighted linear combinations of previous samples. TI’s LPC’s clever 
prediction approach, and the use of an ingenious lattice Filter, greatly simplifies the synthesis cir- 
cuitry. The resulting system can be stored on a single chip and produces high quality natural 
sounding speech. 

4. PARCOR 

PARCOR (partial correlation), utilized by Japanese manufacturers, is a variant of LPC. LPC 
extrapolates from a series of formant samples to predict following formant frequencies. Though 
most speech patterns change slowly, plosive and fricative sounds involve rapid changes. PAR- 
COR makes LPC more sensitive to sudden changes by giving greater emphasis to the correlation 
between adjacent parametric samples and less to the longer term patterns. However, there ap- 
pears to be little resultant subjective differences in observed speech quality between the two ap- 
proaches. 

5. Line Spectrum Pair (LSPJ 

NTT (Nippon Telephone and Telephone Public Corp.) which developed PARCOR, has come 
up with LSP, an approach allowing still further compression. LSP defines the boundary condi- 
tions for the individual formant frequencies as those corresponding to the open and closed vocal 
tract. NTT claims that for a complete system, some 40% more compression can be achieved with 
LSP than with PARCOR, while maintaining nearly the same speech quality. 
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6. Parametric Waveform Coding (PWC) 

PWC is another variant of LPC, as used by Centigram’s Voice Ware system to produce 
vocabularies for the Lisa Speech Board.* PWC uses a variable-length slice of waveform to pro- 
duce the linear prediction coefficients. Each slice (about 20 milliseconds in length) corresponds to 
a “glottal event” — the event associated with each puff of air passing through the vocal tract. 
Voice Ware uses an array processor to determine 13 linear prediction coefficients for each glottal 
event. To synthesize speech, the Lisa Speech Board uses these coefficients and the lengths of the 
events to recreate speech waveforms as in other LPC synthesizers. The PWC approach tends to 
yield more natural speech than the simpler LPC systems, but requires a higher data rate. 

G. Waveform Coding Schemes 
1. ADPCM 

Digitized speech at a 8 kHz sampling rate results in 32,000 bits per second (bps) for a 4 bit 
sampling size using the adaptive differential pulse code modulation (ADPCM) proposed as the 
worldwide preferred method of digitized voice telephone signals for long distance transmission. 
In ADPCM, the digitized speech is encoded in terms of the amplitude differences between adja- 
cent samples. These differences are adaptively encoded in terms of quantization level (a function 
of the previous quantization level and the previous PCM value). A close relative of ADPCM is 
CVSD (continuous variable slope delta modulation). 

2. Mozer’s Waveform Coding 

Though ADPCM is suitable for telephone transmission, its high bit rate is unsuitable for stored 
speech synthesis. A scheme by Dr. Forrest Mozer of the University of California is a variation of 
ADPCM which provides substantial further compression. This technique has been incorporated 
into the National Semiconductor’s Corporation’s Digitalker. Dr. Mozer’s approach is to: 

1) Analyze the waveform to detect short periods with little change. The waveform for these 
periods are then replaced with identical waveforms. 

2) Fourier analyze the signal and adjust the phase angle of each Fourier component to produce 
a symmetrical waveform and then discard half. 

3) Discard low amplitude portions of the waveform which are not heard by the ear. 

4) Employ ADPCM to further reduce data. 

The net result of these actions is more than a 40 to one reduction in the data that needs to be 
stared, as compared with the data in direct digitization. To produce speech the process is inverted. 
Though these resultant signals look little like the original, the result is very good speech reproduc- 
tion. 

H. Coding the Words To Be Stored 

Though the schemes discussed thus far provide a huge amount of reduction in the storage re- 
quired, generating the required custom vocabulary in terms of the stored parameters requires 
hand tailoring by an expert. As yet, there is no acceptable automatic mechanism for directly con- 
verting speech into satisfactory storage elements for encoding schemes that provide high data 


*No longer in production. 


79 



compression. (ADPCM is automatic. Parametric schemes can be automated with small residual 
errors.) 

Developing the vocabulary for the Mozer Waveform Coding, used in National 
Semiconductor’s Digitalker, takes about one hour of processing per word. It involves working 
with the data compression and zero phase-encoding algorithms, that produce the stored bit pat- 
terns, making it very difficult for users to program their own custom vocabularies (Ciarcia, 1983). 

To enable users to develop their own custom vocabularies for their products, when large 
vocabularies are required, Centigram Corp. has offered as a product their Voice Ware develop- 
ment system. With it, users can input tape recorded voice to a digitizer that supplies a 4800 bps 
data stream to a microprocessor-based CRT-terminal work station. The station converts the 
signal into parametric waveform coding (PWC). The user can then edit the messages, combine 
them into files, and feed them back through the Lisa synthesizer to hear how they sound. If the 
sound is unsatisfactory, particularly for concatenated phrases, the phrases can be rerecorded to 
achieve the desired continuity and balance. 

In general, for synthesizer users requiring a small custom vocabulary, it is customary for them 
to contract with the synthesizer manufacturer or other development source for the words re- 
quired. This cost is in the order of $100 per word for LPC chips. 

I. Generating Speech from Text 

English has some 40 basic speech sounds called phonemes, corresponding to 16 vowel sounds, 6 
stops, 8 fricatives, 3 nasals (such as ng), 4 liquids/glides (such as 1 in lice) and 3 others (such as ch 
in church). These sounds vary somewhat depending upon how they are combined into words or 
used in speech. These phoneme variations are called allophones. (Texas Instruments developed a 
set of 128 allophones to characterize English speech.) Allophones and the rules to string them 
together can be stored in computer memory chips. The first text-to-speech system used a 
phonemic synthesizer (Votrax). Votrax utilized a hard-wired phonemic to parameter converter 
which then fed a formant synthesizer to create speech. A simplified text-to-speech system 
schematic is given in Figure V-3. 

The highly-intelligible state-of-the-art speech synthesizer, the Speech Plus “Prose 2000,” 
utilizes a generation approach consisting of five serial processes : 1) Text normalization, 2) 
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Figure V-3. Text to Speech Synthesis. 
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Phonemics, 3) Allophonics, 4) Prosodies, 5) Parameter generation. For words not in the excep- 
tions lexicon, the phonemics process is implemented as a real-time expert system consisting of a 
small rule interpreter and an ordered set of about 400 context-sensitive rules. 

J. State of the Art 

Elphick (1981, p. 42) notes that: 

Most commercial synthesizers, especially low-cost ones used for consumer products, derive their speech elements 

from recordings of actual human speech. The recorded speech patterns are compressed, and the speech is 

disassembled into a vocabulary of small elements for later reassembly into messages. 

High quality speech by phoneme synthesizers has been achieved in research systems, but not in 
commercial systems. The most natural commercial speech synthesizers use the waveform ap- 
proach. 

Figure V-4 is an indication of speech quality versus bit storage requirements for the various syn- 
thesis techniques. 

Thus far, in industrial applications, only short messages are practical, as prolonged listening to 
synthetic speech tends to fatigue the operators (Andreiev, 1981). 

Speech chips with limited vocabularies are available in the range of S10 and up. To construct 
the initial representations for new words (to be stored in ROMs) runs upward of tens of dollars 
per word. 



Figure V-4. Speech Quality Versus Bit Rate for Various Coding Schemes. 


81 




ENGLISH TEXT 
(IN ASCII CODE) 



SYNTHETIC SPEECH 
WAVEFORM 

\ 


D/A CONVERTER, 
AMPLIFIER AND 
SPEAKER 

r 

SPEECH SIGNAL 

After Zue (1982) 


Figure V-5. Text to Speech Conversion. 
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Programming advanced speech synthesizers, to be used with speech generation from text, is an 
enormous task. The flow diagram for such a state of the art system is given in Figure V-5. First, 
the printed text must be converted into phonemes by using a combination of rules and a stored 
pronouncing dictionary, taking into account pitch, intensity, and duration associated with em- 
phasis, as influenced by word use determined by the syntax of the sentence. The resultant 
allophones (phonemic variations) are then fed to a phonemic voice synthesizer. 

The major commercial application thus far for speech generation from text is reading systems 
for the blind. These products input text using optical character recognition, and output speech us- 
ing a text-to-speech synthesizer. Other applications include electronic mail-to-voice, and proof- 
reading. 

K. Some Available Commercial Systems 

An indication of manufacturers and currently available commercial systems is given by Table 
V-2. 


L. Problems and Issues 

• There is a tradeoff in system design between speech quality, vocabulary size, and cost. 

• Problem of how to best divide the fundamental units to be used — allophones, syllables, 
words. The smaller units permit very large vocabularies without excessive storage re- 
quirements, while the larger units (such as phrases) provide superior speech quality. 

• Memory cost considerations tend to restrict the use of the word synthesis approach. 

• As the synthesizer techniques improve, it may be that errors due to low sampling rates, and 
inadequate consideration of coarticulation and prosodic (speech stress) effects may be the 
limiting factors. 

• Speech compression techniques are crucial to minimize memory requirements in the syn- 
thesizer. 

• The high cost of generating words for synthesizer vocabularies needs to be reduced. 

• Similarly, the high cost of storing words in ROM needs to be addressed. 

• Updating stored vocabularies is problematical due to the need to keep the same speaker 
available. 

M. Forecast 

Though the market for voice synthesizers is still relatively small, it is estimated that it will be 
close to one-half billion dollars by 1985 and will reach several billion dollars by 1990. Talking 
devices will have a big impact on industrial operations, a major effect on learning devices, and 
will probably be ubiquitous throughout home and consumer products. These devices will be a 
boon to the handicapped, in everything from talking typewriters and appliances, and reading 
machines for the blind, to speech prosthetics. It is also anticipated that these devices will be found 
virtually everywhere in vehicles and transportation systems. 

Because of their integration into single chips, the cost of stored vocabulary devices will con- 
tinue to drop so that basic hardware costs of less than $10, for units having vocabularies of 
several hundred words, are foreseen by the end of this decade. 
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TABLE V-2. Some Available Commercial Synthesizer Systems 


Manufacturer 

Model 

Cost 

Type 

Comments 

Votrax 
(Troy, MI) 

VSM/I 

$995 

Formant 

A singleboard complete system 
incorporating a programmable memory 


SVA 


Formant 

Singleboard synthesizer for unlimited 
text to speech (no internal word storage) 


SC-02 


Formant 

Synthesizer chip with phoneme library 

Speech Plus, Inc. 
(Mt. View, CA) 

Prose 2000 

$3500 

Formant 

Singleboard system achieving an unlimited 
vocabulary capability by using 400 rules and 
a 3000 word exceptions lexicon. For use with 
text. 


Speech 1000 

$1200 

LPC 

Synthesizer board with up to 6 minutes 
stored vocab. 

Texas Instruments (TI) 
(Dallas, TX) 

TMS 5220 
TMS 6100 

$5 

$5 

LPC 

LPC 

Single chip voice synthesizer processor. 
Single chip voice synthesizer memory. 


Speech synthesizer 
for TI 99/4A 
Personal Computer 

$100 

LPC 

Text to speech implemented in 99/4A. 


TM 990/306 


LPC 

Speech module (does not have unlimited 
vocabulary capability of formant systems). 

National Semiconductor 
(Santa Clara, CA) 

Digitalker 
.MM 54104 


Mozer’s 

Waveform 

Digitizer 

Single chip with 256 possible addressable 
expressions. 

Centigram 
(Sunnyvale, CA) 

GIM 

$350 

Formant 

An SBX module using the GI250 synthesizer 
chip. 


SYBIL 

$495 

Formant 

A single channel synthesizer for 


the IBM PC. 



TABLE V-2. Some Available Commercial Synthesizer Systems (cont.) 


Manufacture 

Model 

Cost 

Type 

Comments 

Kurzweil Computer 

Reading 

$30,000 

Formant 

Uses Speech Plus Prose 2000 synthesizer. 

Products 

Machine for 




(Cambridge, MA) 

Blind 




American 

53610 

* 

LPC 


Microsystems 
(Santa Clara, CA) 

53620 

* 

LPC 


General Instruments 

Allophone 


LPC 

Annunciates 64 Allophones 

(Hicksville, N.Y.) 

Synthesis 

Module 

SP250 

* 

Formant 

Single channel synthesizer 


SP256 

* 

Formant 

Single channel synthesizer, with 
microprocessor control 

Hitachi 

HD 38880 

* 

PARCOR 

Uses Partial Autocorrelation 

Nippon Electric Corp 



PARCOR 

(closely related to LPC) 

Sanyo 

LC 1800 

* 

PARCOR 


Mitsubishi 

M58817 

* 

PARCOR 


Matsushita 

(Japan) 


* 

LPC 


Master Specialties 

1650 

$500 

Word Synthesis 


(Costa Mesa, CA) 


+ Vocabulary 
at $50/word 



Intex Micro Systems 

Intex-Talker 


Text-to-Speech 

Uses a text-to-phoneme algorithm 

(Troy, N.Y.) 



Synthesizer 

and a Votrax SC-01 chip. 

Motorola 



CVSD 

Encoder and Decoder Chips. 

Phillips/Signetics 

MEA 8000 

* 

Formant 



ME A 10000 

* 

Formant 


OKI Semiconductor 



ADPCM 

Encoder and Decoder Chips. 


♦Chip prices range from $3 to $15 depending on model and quantity. Speech Plus provides custom vocabulary generation services for speech 
synthesizer chips at $100/\vord. 
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VI. PROBLEM SOLVING AND PLANNING 


A. Introduction 

Nilsson at SRI originally specified problem solving and planning as being one of the four 
fundamental application areas of AI. However, the “weak methods,” employing little domain 
knowledge, originally used in AI for problem solving and planning, proved inadequate for com- 
plex real-world problems. Thus, in seeking solutions in this area, larger amounts of knowledge 
have since been utilized. The net result has been that the “Knowledge Engineering” methodology 
used for Expert Systems has been adapted for use in problem-solving and planning. Thus, the 
boundary between problem-solving and planning and expert systems has faded and it is now com- 
mon to refer to all these knowledge-based activities as expert systems and are therefore covered in 
that volume of this series. Nevertheless, this chapter will briefly review some of the earlier less 
knowledge-intensive systems and several examples of recent systems. 

B. Planning Defined 

Most of AI applications can be considered as examples of problem-solving, which are well 
covered in the other AI application areas: Expert Systems, Computer Vision, Language 
Understanding, etc. In this chapter we will only consider planning systems. Planning can be 
defined for our purposes as the design process for selecting and stringing together individual 
actions into sequences in order to achieve desired goals. 

C. Basic Planning Paradigm 

Wilensky (1983) outlines the basic structure of plans from the viewpoint of common-sense 
problem solving and natural language understanding. A schematic for Wilensky’s basic planning 
paradigm is given in Figure VI-1. In this paradigm, the planner recognizes from the environment 
that a new situation has arisen which merits a goal. The planner then retrieves from memory a 
plan that might be used to achieve this goal, or generates a new trial plan if no existing plan is 
suitable. This candidate plan is then projected forward (via simulation) to observe the outcome. 
This outcome is examined to see if there are any conflicts that will arise in achieving other goals if 
this plan is pursued. If not, this and other candidate plan outcomes are evaluated and the 
maximum-valued plan is chosen. The plan, when implemented, will modify the current state-of- 
affairs. This impact, together with any other changes in the environment, results in a new world 
model with new situations that may merit new goals, so that the cyclic process of planning con- 
tinues. When candidate plans are being considered, if the candidate plan overlaps existing plans 
for other goals, these overlapping plans may be merged to conserve resources. 

A basic problem in planning is that of conflicting goals. The causes of conflicting goals are in- 
dicated in Figure VI-2. (A preservation goal is a goal to preserve an already existing condition, or 
is a goal not to undo a desirable state or goal resulting from another plan.) 
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Figure VI-1. Wilensky Planning Paradigm. 
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EVENT 


Figure VI-2. Nature of Goal Conflicts. 



Problems arising from conflicting goals are dealt with by replanning or by eliminating the fac- 
tors causing the goal conflicts. A flow diagram for resolving goal conflicts is given in Figure 
VI-3. If the goal conflicts cannot be completely resolved, then partial fulfillment of goals may be 
attempted or goals of lesser importance may have to be dropped. The global strategy is to achieve 
as many goals as possible, maximizing the composite value of the goals achieved, and not waste 
resources in achieving them. 

DEVISER (Vere, 1983) is a good example of a planning program designed to deal with conflict- 
ing goals resulting from resource and time constraints. 

Wilensky also discusses “competing goals” that arise in competitive situations. The planning 
strategies given in this case are to: 

1) Avoid conflicts 

2) Outdo an opponent 

3) Hinder an opponent 

4) Induce alterations in competitive plans. 

D. Paradigms for Generating Plans 

The major issue in any planning system is reducing search. The other key issue is how to handle 
interacting subproblems. The following paradigms are different approaches to addressing these 
issues. 

Cohen and Feigenbaum (1982) discuss four distinct approaches to planning: nonhierarchical, 
hierarchical, script-based (skeletal) and opportunistic. Virtually all plans, both hierarchical and 
nonhierarchical, have hierarchical subgoal structures. That is, each goal can be expanded into 
several subgoals, which themselves can be further expanded, etc. until the bottom level consists of 
operators needed to achieve the lowest level goals. The distinction between hierarchical and 
nonhierarchical planners is that “. . .a hierarchical planner generates a hierarchy of representa- 
tions of a plan in which the highest is a simplification, or abstraction of the plan and the lowest is 
a detailed plan, sufficient to solve the problem. In contrast, nonhierarchical planners have only 
one representation of a plan.” (pp. 516-517) 

1. Nonhierarchical Planning 

Nonhierarchical planning does not initially distinguish between important and unimportant ac- 
tions so that everything is considered in the initial plan, including cumbersome details. For com- 
plex problems, this often results in a large search. One way the search can be greatly reduced is by 
initially assuming subgoals independent and then trying to repair the plan to account for the 
interactions (as in HACKER, Table VI-1-2). 

A knowledge based approach used in ISIS-II (Fox et al., 1982) is to prune the search space prior 
to search by using constraints, and then narrow the space actually searched by using a “beam 
search” approach. 

2. Hierarchical Planning 

In this approach, first a high level plan is formulated considering only the important aspects, 
then the vague parts of the plan are refined into more detailed subplans. By ignoring the details at 
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META THEMES 


1. ACHIEVE AS MANY 
GOALS AS POSSIBLE 

2. MAXIMIZE VALUE 
OF GOALS ACHIEVED 


PLAN HAVING GOAL CONFLICTS 


3. DON'T WASTE RESOURCES 


MAKE ATTRIBUTION 
FOR CAUSE OF 
FAILURE 


KNOWLEDGE-BASED 

BACKTRACKING 


ATTRIBUTES 
CAUSING FAILURE 
, IN PLAN 


PLANS AND 
THEIR ASSOCIATED 
ATTRIBUTES 


SELECT PLAN WITHOUT 
OBJECTIONABLE 
ATTRIBUTES 


KNOWLEDGE BASE 



SELECT 
MAXIMUM 
VALUED PLAN 


ATTACK 
ATTRIBUTES 
| CAUSING GOAL 
CONFLICTS 


CHANGE CIRCUMSTANCES 

■ REMOVE A DEADLINE 

■ CHANGE THE TIMING OF AN 
EXTERNAL EVENT 

i INCREASE AN ABILITY 

• INCREASE THE CAPACITY 
OF A FUNCTIONAL OBJECT 

> ABANDON A BACKGROUND GOAL 


■ CHANGE SITUATION 

■ CHANGE PROHIBITION 


UNDO PLAN COMPONENT 
CAUSING PRESERVATION GOAL 



ATTEMPT 

PARTIAL 

GOAL 

FULFILLMENT 


SELECT PLANS 
TO MAXIMIZE 
VALUE OF GOALS 
ACHIEVED 


Figure VI-3. Resolving Conflicting Goals by Replanning (and/or Attacking Factors Causing 
Conflicts). 
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the higher levels, search is vastly reduced. ABSTRIPS (Table VI-1-5) is illustrative of this 
approach. 

3. Utilization of Skeleton Plans 

This approach utilizes stored plans which contain the outlines for solving many different kinds 
of problems. The skeleton plans are then filled in for the particular problem being solved. This 
technique has similarities to Schank’s script-based approach to language understanding. KNOBS 
(Engelman et al., 1980, Table VI-1-10), a frame-based planning system for tactical air strikes, is 
an example of a skeletal plan approach. 

4. Opportunistic Planning 

Opportunistic planning (Hayes Roth and Hayes Roth, 1978) is based on the way that humans 
often approach planning. In this approach, the plan is developed piecewise, with parts of the plan 
being developed separately, and then added to, enlarged and linked together as opportunities pre- 
sent themselves. Planning of this sort incorporates both top-down and bottom-up components. 

E. Planners 

In this section we summarize the characteristics of some of the key AI planning systems that 
have evolved over the years. Figure VI-4 diagrams the various systems that are reviewed and their 
relation to the basic paradigms. Tables VI-1 outline the systems shown in Figure VI-4, using the 
Expert Systems format (Figure 1-1) developed in Chapter I. Note that planners evolve by building 
on past techniques. For example, DEVISER (Table VI-1-9), the first planner to deal explicitly 
with time, is based on NOAH (Table VI-1-4), with facilities having been added to keep track of 
event “windows” and durations. Figure VI-5 presents a simplified flow chart of Deviser’s core 
planning component. 

Information on current research in planning is given in Robinson (1983). 

F. Trends 

Automatic Planning is still a difficult task. The current trend is toward the use of knowledge 
engineering to configure planners as expert systems. Thus, knowledge-based planners are in- 
cluded, and further discussed, in the volume on expert systems. 

Another trend is toward increased concern with spatial-temporal planning. This is exemplified 
by Malik and Binford (1983), Allen and Koomen (1983) and Brooks (1983). 
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Figure VI-4. Planning Techniques . 



TABLE VI-l-L Planners . 


SYSTEM: 

INSTITUTION: 

AUTHORS: 


STRIPS 

SRI 

Fikes, R.E. and Nilsson, N.J. (1971) 


Key Elements of 





Global Data Base 


Purpose 

Approach 

Knowledge Base 

(System Status) 

Control Structure 

• Devises plans 

• Uses Means-Ends 

• Uses a first-order 

• Goal 

• Means-Ends Analysis 

for a robot 

analysis (A particular 

logic representation 



to move objects 

implementation of 

of facts (world 

• Initial state of the 

• Depth first search 

between rooms. 

GPS) 

model). 

system. 

using backtracking as 
required. 


• Learns by construct- 

• List of problem 

• Operators used thus far 



ing macro-operators 

solving operators, 




(by saving and gener- 

together with their 

• Current state of the 



alizing plans). 

necessary precon- 
ditions and the 
changes they make 
in the state (what 
is added and what is 
deleted from world 
model when they are 
applied). 

system. 



TABLE VI-1-2 . Planners . 


SYSTEM: 

INSTITUTION: 

AUTHORS: 


HACKER 

M.I.T. 

Sussman, G.J. (1975) 


Key Elements of 


Purpose Approach 

Knowledge Base Global Data Base Control Structure 

• Skill Acquisition: 
Devises a skill (set 

of procedures) to solve 
a problem. 

• e.g. plan to reorder 
blocks to a stack. 

• Formulate plans to 
solve subgoals 
independently and then 
patch them up. (e.g., to 
correct interferences 
where achieving one 
subgoal may prevent 
accomplishment of 
another). 

• Solves Problems by: 

1) Searching for an 
appropriate pro- 
cedure. 

2) If procedure does not 
achieve desired goal, 
reasons for failure 
are formalized as 
bugs. 

3) Using library of bug 
correction pro- 
cedures, the plan 

is debugged. 

• If no procedure is 
available to solve 
problem, a new pro- 
cedure is written using 
the programming tech- 
niques library. 

• Answer Library: • Goals • Search for appropriate 

problem-solving procedures to achieve 

procedures • Procedures used goals and correct 

bugs. 

• Knowledge Library: • Bugs • Write new procedure 

facts about the when no appropriate 

domain procedure is found. 

• Programming Tech- 
niques Library: 

to devise new 

problem-solving 

procedures 

• Library of generic 
bugs 

• Library of bug cor- 
rection procedures 



TABLE VI-1-3 . Planners . 


VO 

o\ 


SYSTEM: 

INSTITUTION: 

AUTHORS: 


INTERPLAN 
U. of Edinburgh 
Tate, A. (1975) 


Key Elements of 


Purpose Approach 

Knowledge Base Global Data Base Control Structure 

Planning in the blocks 
world, e.g. stacking 
blocks. 

• Formulates plans to 
solve subgoals inde- 
pendently. If achieving 
one subgoal prevents 
accomplishment of 
another and it cannot 
be repaired with a 
procedure to achieve 
its prerequisite (as 
in HACKER) then it 
reorders its subgoals. 
The subgoal at which 
failure occurs is pro- 
moted — moved to an 
earlier position in the 
list of subgoals to be 
achieved. 

• Facts about the • Goals • Search for operators 

domain. to achieve subgoals. 

• Operators used thus 

• Operators to far • Correct interferences 

achieve state by reordering subgoals, 

changes — includes • Interferences noted 

information about 
preconditions and 
what changes op- 
erators make in 
world model. 



TABLE VI-1-4. Planners . 


SYSTEM: 

INSTITUTION: 

AUTHORS: 


Not Named 
SRI 

Waldinger, R. (1977) 


Key Elements of 


Purpose Approach 

Knowledge Base Global Data Base Control Structure 

• Planning 

• e.g: stack blocks 

Construct a plan by 
solving one conjunc- 
tive subgoal at a time. 

If a subgoal solution 
interferes with other 
goals already achieved, 
rather than reordering 
the conjunctive subgoals 
use “goal regression/* 
That is, move the of- 
fending subgoal back 
over previously achieved 
goals until it finds a 
place in the plan where 
the goal will not violate 
previously achieved goals. 

• Facts about the • Goals • Search for operators 

domain to achieve subgoals. 

• Operators used thus 

• Operators to far • Goal regression 

achieve state 

changes — together • Interferences noted 

with their pre- 
conditions and what • New subgoals 

changes they make (regressed goals) 

in world model. 


TABLE VI- 1-5. Planners . 


SYSTEM: 

INSTITUTION: 

AUTHORS: 


ABSTRIPS 

SRI 

Sacerdoti, E. D. (1974) 


Key Elements of 


Purpose Approach 

Knowledge Base Global Data Base Control Structure 

Devises plans for a 
robot to move objects 
between rooms. 

Do hierarchical planning 
by first devising a top 
level plan based on the 
key aspects of the 
problem, then succes- 
sively refining it by 
considering less critical 
aspects of the problem. 

Recipe: 

1 . Fix abstraction levels 
for solutions (plans). 

2. Problem solution 
proceeds top down 
(most abstract to 
most specific). 

3. Complete solution 

at one level and then 
move to next level 
below. 

Criticality assign- Goal Goal directed (backward 

ments of elements in chaining at each level), 

robot planning Initial state of system 

domain. (criticality at maximum) Top down refinement of 

plans using hierarchical 

Configuration of the Plans thus far. abstract search spaces, 

rooms. 

Current criticality level 

Objects and their 
properties in the 
domain. 

Rules for decre- 
menting criticality 
level. 

Heuristic search 
rules for each level. 



TABLE VI-1-6. Planners . 


SYSTEM: 

INSTITUTION: 

AUTHORS: 


NOAH 

SRI 

Sacerdoti, E.D. (1975) 


Key Elements of 


Purpose 

Approach 

Knowledge Base Global Data Base Control Structure 

Robot Planning System 

• Hierarchical planner — 

• Rules for recog- • World Model • Least commitment. 

(assigns an ordering 

develops hierarchy of 

nizing interfer- 

to operators in a plan. 

subgoals by expanding 

ence between plans. • Goal • Backward chaining. 

e.g., an assembly task) 

goals. (Lowest level 



subgoals eventually 

• Rules for resolving • Subgoals 


expanded by problem- 

interferences. 


solving operators.) 

• Partial ordering of 



• Domain Knowledge operators in subgoal 


• Expands, in parallel, 

—functions that plans. 


individual plans for 

expand goals into 


interacting subgoals, 

subgoals • Interference 


but initially assigns 

— operators to between plans. 


only a partial ordering 

transform one 


to operators. Stops 

state to another. 


when interference 

Effects of actions 


between the partial 

are represented 


subgoal plans is 

explicitly (via add 


observed, and adjusts 

lists and delete lists) 


the ordering of the 



operators as needed to 



resolve the interference. 



• Develops procedural 



nets to represent plans 



as they are developed. 
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TABLE VI- 1-7. Planners. 


SYSTEM: MOLGEN I 

INSTITUTION: Stanford U. 

AUTHORS: Stefik, M.J. (1980) 


Key Elements of 


Purpose 

• Assist molecular gen- 
eticists in planning 
experiments. 


Approach 

• Hierarchical planner 
using three levels of 
control 

—Strategy space: 
switches between least 
commitment and 
heuristic guessing 

—design space: 
makes decisions 
about how plan is 
to develop (produces 
goals and con- 
straints). 

— planning space: 
contains a hier- 
archy of operations. 

Initially plan ex- 
periments with abstract 
operations (merging, 
amplifying, reacting 
and sorting) and general 
objects (gene, organism 
and plasmet). As 
specific operators 
or objects are chosen 
to replace the abstract 
ones, constraints are 
introduced into the 
plan. 


Knowledge Base 


Global Data Base 


Control Structure 


• Explicit meta-level 
problem-solving 
operators to reason 
with constraints. 

• Problem-solving 
rules. 

• Rules for guessing 

• Rules for discovering 
interactions between 
subproblems via 
constraint propagation. 

• Domain knowledge 


• Goals 

• Partial solutions 

• History of guesses 
and their effects. 

• Constraints. 


Constraint propagation. 

Least commitment 

Heuristic guessing 

Relevant backtracking 

Use of meta-rules to 
reason with constraints. 

Hierarchical refinement. 

Difference reduction. 


TABLE VI-1-7. Planners, (cont.) 


Purpose 


Approach 


Knowledge Base 


Key Elements of 
Global Data Base 


Control Structure 


• Represent interactions 
between subproblems as 
constraints. 

• Formulate constraints 
as goals to be solved. 

• Use constraint 
propagation to 
reveal interactions 
between subproblems. 

• Suspend problem-solving] 
as necessary, until 
sufficient information 
is derived from the 
interchange of con- 
straints (least commit- 
ment, opportunistic 
expansion). 

• Use heuristic guessing 
to make choices when 
there is otherwise no 
compelling reason to do 
so. 

• Retract guesses as 
necessary when an 
unresolvable problem 
is encountered. 


TABLE VI- 1-8. Planners. 


SYSTEM: 

INSTITUTION: 

AUTHORS: 


MOLGEN II 
Stanford U. 

Friedland, P.E. (1979) 


Key Elements of 


Purpose Approach 

Knowledge Base Global Data Base Control Structure 

Plan molecular 
genetic experiments 

• Start with skeletal 
plan 

• Instantiate each of plan 
steps by a method that 
will work within the 
environment of the 
particular problem. 

• Plan steps are estab- 
lished by choosing 
techniques that (in 
order of priority) 
satisfy the criteria: 

1. It will carry out 
the specific goal 
of the step. 

2. Be successfully 
applied to the 
given molecule. 

3. .Of the techniques 
satisfying criteria 

1 and 2, it is the 
best (e.g., with 
respect to reli- 
ability, convenience 
accuracy, cost, time 
required). 

• Well organized ex- • Goal • Proceed linearly thru 

pert domain knowl- plan matching skeletal 

edge represented • Skeletal plan chosen steps to techniques by 

using the UNITS name, synonym, or 

package: • Plan thus far. function, and choosing 

skeletal plans the most desirable 

classified according of those that match, 

to utility. 

— objective knowledge. 

— hierarchical organiz- 
ation of about 400 
techniques needed 
to instantiate 
plans. 


TABLE VI-1-9. Planners. 


o 

OJ 


SYSTEM: DEVISER 

INSTITUTION: JPL 

AUTHORS: Vere, S. (1983) 


Purpose 

General Purpose 
Automated Planner/ 
Scheduler to generate 
parallel plans to achieve 
goals with time con- 
straints. 

(e.g.. Scheduling 
spacecraft actions 
during a planetary 
flyby). 


Backward chaining 
from unordered sub- 
goals by: 

1) Satisfying goals 
where possible, by 
linking goal nodes 
with the same al- 
ready achieved 
nodes. 

2) If subgoals cannot 
be met by linking, 
nodes are expanded 
in parallel, step by 
step, into activities 
which achieve the 
subgoals. 

3) When two parallel 
expansions produce 
contradictions, con- 
flicts are resolved 
by ordering nodes 
(formerly unordered). 


Knowledge Base 

• Rules for recognizing 
interferences between 
subgoal expansions. 

• Rules for reordering 
subgoal plans to 
resolve conflicts. 

• Domain Knowledge: 
— Operators to trans- 
form one state to 
another. Effects of 
actions are repre- 
sented explicitly 
by add lists and 
delete lists. 

— Goal windows and 
durations 
— Event schedules 


Key Elements of 

Global Data Base 

• World Model 

• Subgoals 

• Ordering of operators 
in subgoal plans. 

• Interferences 
between subgoal plans. 

• Node expansion 
histories. 

• Current windows. 


Control Structure 

• Least commitment 

• Backward chaining 

• Dynamic maintenance 
of windows of activities 
and goals to preserve 
consistency. 


4) If conflicts can’t be 
resolved by ordering, 
DEVISER backtracks 
to the last choice 
point and tries 
another alternative. 


TABLE VI-1-9. 


Purpose Approach Knowledge Base 

• A start window for 
each activity in the 
plan is updated 
dynamically during 
plan generation, in 
order to maintain 
consistency with the 
windows and durations 
of adjacent goals 
and activities. 


(cont.) 


Key Elements of 

Global Data Base Control Structure 




TABLE VI-1-10. Planners . 


SYSTEM: 

INSTITUTION: 

AUTHORS: 


KNOBS 

MITRE 

Engelman, C. et al. (1980) 


Key Elements of 


Purpose 

Approach 

Knowledge Base 

Global Data Base 

Control Structure 

Planning Consultant 

• Assist a user by 

• Targets stored hier- 

• Target 

• Frame instantiation 

for A.F. Tactical 

interactively accepting 

archically in frames 


uses rules and con- 

Missions. 

mission data and using 

— individual targets 

• Airbase from which 

straints. 

Other domains include: 

it to instantiate a 
stereotypical solution 

inherit from generic 
targets. 

to fly mission. 

• Backward chaining of 

— Naval “show of 

to user’s problem— 


• Type of Aircraft 

production rules in a 

flag” missions. 

checking input for 

• Frames representing 


MYCIN-like deductive 

— Scheduling of crew 

inconsistencies and 

protypical missions 

• Armaments 

manner to manage such 

activities for the 

oversights. 

and sub-missions. 


generic choices as air- 

NASA space shuttle. 

• Represent the stereo- 
typical missions as 
frames. The checks are 
constraints among the 
possible slot values 

in such frames. 

• Uses a natural 
language interface 
(APE II). 

i 

• Resource data 

— frames representing 
static descriptions of 
object attributes, 
with inheritances 
via linkage to more 
generic frames. 

• Scripts composed of 
causally linked chains. 

• The overall knowledge 
base network consists 
of several thousand 
frames. 

• Rules for instantiation 
of frames and slots. 

• etc. 

craft, weapons, support, 
and electronic counter- 
measures. 

• Inference mechanism 
uses a syntactic pattern 
matcher with provisions 
for restrictions on 
variable instantiations. 
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TABLE VI- 1-11. Planners. 


SYSTEM: ISIS-II 

INSTITUTION: CMU 

AUTHORS: Fox, Allen & Strom (1982) 


Purpose 

Job-Shop Planning/ 
Scheduling of Parts 
Production 


Approach 

• Generate schedules by • Constraints (and their • Preference constraints • Pre-search pruning 
heuristic search using importance) — queue positions of search space — 

evaluation functions —Organization goals —machine preferences based on constraints 

based on constraints (associated with 

associated with costs, profit). • Partial paths and • Beam Search using 

process applicability, —Physical constraints their evaluations. evaluation functions 

machine availability, Gating constraints based on constraints 


I Knowledge Base 


Global Data Base Control Structure 


• Work in progress 

• Shop status 

• Goals, due dates, and 
attributes of parts to be 
manufactured. 


1. Use constraints to 
perform a rule-based 
pre-search analysis 
to bound search. 

2. Do a constraint- 
directed “beam-search* 
where only the top- 
rated “n” partial 
paths are saved. 

3. Perform post-search 
analysis to determine 
if search was effective. 


and supervisor (preconditions for 

preferences. object applicability 

or process initia- 

• Set up mechanism to tion). 

dynamically relax 
constraints as required. 

Sequence for Generating 
a Schedule 


SUBGOALS 



Figure VI-5. Simplified Flow Chart of Deviser's Core Planner. 
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